Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Harold Soh

NUSense: Robust Soft Optical Tactile Sensor

Oct 30, 2024

Madina Yergibay, Tleukhan Mussin, Saltanat Seitzhan, Daryn Kenzhebek, Zhanat Kappassov, Harold Soh, Tasbolat Taunyazov

Abstract:While most tactile sensors rely on measuring pressure, insights from continuum mechanics suggest that measuring shear strain provides critical information for tactile sensing. In this work, we introduce an optical tactile sensing principle based on shear strain detection. A silicone rubber layer, dyed with color inks, is used to quantify the shear magnitude of the sensing layer. This principle was validated using the NUSense camera-based tactile sensor. The wide-angle camera captures the elongation of the soft pad under mechanical load, a phenomenon attributed to the Poisson effect. The physical and optical properties of the inked pad are essential and should ideally remain stable over time. We tested the robustness of the sensor by subjecting the outermost layer to multiple load cycles using a robot arm. Additionally, we discussed potential applications of this sensor in force sensing and contact localization.

* Madina Yergibay and Tleukhan Mussin contributed equally. 6 pages, 6 figures

Via

Access Paper or Ask Questions

Imitation Learning with Limited Actions via Diffusion Planners and Deep Koopman Controllers

Oct 10, 2024

Jianxin Bi, Kelvin Lim, Kaiqi Chen, Yifei Huang, Harold Soh

Figure 1 for Imitation Learning with Limited Actions via Diffusion Planners and Deep Koopman Controllers

Figure 2 for Imitation Learning with Limited Actions via Diffusion Planners and Deep Koopman Controllers

Figure 3 for Imitation Learning with Limited Actions via Diffusion Planners and Deep Koopman Controllers

Figure 4 for Imitation Learning with Limited Actions via Diffusion Planners and Deep Koopman Controllers

Abstract:Recent advances in diffusion-based robot policies have demonstrated significant potential in imitating multi-modal behaviors. However, these approaches typically require large quantities of demonstration data paired with corresponding robot action labels, creating a substantial data collection burden. In this work, we propose a plan-then-control framework aimed at improving the action-data efficiency of inverse dynamics controllers by leveraging observational demonstration data. Specifically, we adopt a Deep Koopman Operator framework to model the dynamical system and utilize observation-only trajectories to learn a latent action representation. This latent representation can then be effectively mapped to real high-dimensional continuous actions using a linear action decoder, requiring minimal action-labeled data. Through experiments on simulated robot manipulation tasks and a real robot experiment with multi-modal expert demonstrations, we demonstrate that our approach significantly enhances action-data efficiency and achieves high task success rates with limited action data.

Via

Access Paper or Ask Questions

Stochastic Bandits for Egalitarian Assignment

Oct 08, 2024

Eugene Lim, Vincent Y. F. Tan, Harold Soh

Figure 1 for Stochastic Bandits for Egalitarian Assignment

Figure 2 for Stochastic Bandits for Egalitarian Assignment

Figure 3 for Stochastic Bandits for Egalitarian Assignment

Figure 4 for Stochastic Bandits for Egalitarian Assignment

Abstract:We study EgalMAB, an egalitarian assignment problem in the context of stochastic multi-armed bandits. In EgalMAB, an agent is tasked with assigning a set of users to arms. At each time step, the agent must assign exactly one arm to each user such that no two users are assigned to the same arm. Subsequently, each user obtains a reward drawn from the unknown reward distribution associated with its assigned arm. The agent's objective is to maximize the minimum expected cumulative reward among all users over a fixed horizon. This problem has applications in areas such as fairness in job and resource allocations, among others. We design and analyze a UCB-based policy EgalUCB and establish upper bounds on the cumulative regret. In complement, we establish an almost-matching policy-independent impossibility result.

Via

Access Paper or Ask Questions

Diffusion Meets Options: Hierarchical Generative Skill Composition for Temporally-Extended Tasks

Oct 03, 2024

Zeyu Feng, Hao Luan, Kevin Yuchen Ma, Harold Soh

Figure 1 for Diffusion Meets Options: Hierarchical Generative Skill Composition for Temporally-Extended Tasks

Figure 2 for Diffusion Meets Options: Hierarchical Generative Skill Composition for Temporally-Extended Tasks

Figure 3 for Diffusion Meets Options: Hierarchical Generative Skill Composition for Temporally-Extended Tasks

Figure 4 for Diffusion Meets Options: Hierarchical Generative Skill Composition for Temporally-Extended Tasks

Abstract:Safe and successful deployment of robots requires not only the ability to generate complex plans but also the capacity to frequently replan and correct execution errors. This paper addresses the challenge of long-horizon trajectory planning under temporally extended objectives in a receding horizon manner. To this end, we propose DOPPLER, a data-driven hierarchical framework that generates and updates plans based on instruction specified by linear temporal logic (LTL). Our method decomposes temporal tasks into chain of options with hierarchical reinforcement learning from offline non-expert datasets. It leverages diffusion models to generate options with low-level actions. We devise a determinantal-guided posterior sampling technique during batch generation, which improves the speed and diversity of diffusion generated options, leading to more efficient querying. Experiments on robot navigation and manipulation tasks demonstrate that DOPPLER can generate sequences of trajectories that progressively satisfy the specified formulae for obstacle avoidance and sequential visitation. Demonstration videos are available online at: https://philiptheother.github.io/doppler/.

Via

Access Paper or Ask Questions

Language-Guided Manipulation with Diffusion Policies and Constrained Inpainting

Jun 14, 2024

Ce Hao, Kelvin Lin, Siyuan Luo, Harold Soh

Figure 1 for Language-Guided Manipulation with Diffusion Policies and Constrained Inpainting

Figure 2 for Language-Guided Manipulation with Diffusion Policies and Constrained Inpainting

Figure 3 for Language-Guided Manipulation with Diffusion Policies and Constrained Inpainting

Figure 4 for Language-Guided Manipulation with Diffusion Policies and Constrained Inpainting

Abstract:Diffusion policies have demonstrated robust performance in generative modeling, prompting their application in robotic manipulation controlled via language descriptions. In this paper, we introduce a zero-shot, open-vocabulary diffusion policy method for robot manipulation. Using Vision-Language Models (VLMs), our method transforms linguistic task descriptions into actionable keyframes in 3D space. These keyframes serve to guide the diffusion process via inpainting. However, naively enforcing the diffusion process to adhere to the generated keyframes is problematic: the keyframes from the VLMs may be incorrect and lead to out-of-distribution (OOD) action sequences where the diffusion model performs poorly. To address these challenges, we develop an inpainting optimization strategy that balances adherence to the keyframes v.s. the training data distribution. Experimental evaluations demonstrate that our approach surpasses the performance of traditional fine-tuned language-conditioned methods in both simulated and real-world settings.

Via

Access Paper or Ask Questions

Arena 3.0: Advancing Social Navigation in Collaborative and Highly Dynamic Environments

Jun 02, 2024

Linh Kästner, Volodymyir Shcherbyna, Huajian Zeng, Tuan Anh Le, Maximilian Ho-Kyoung Schreff, Halid Osmaev, Nam Truong Tran, Diego Diaz, Jan Golebiowski, Harold Soh(+1 more)

Figure 1 for Arena 3.0: Advancing Social Navigation in Collaborative and Highly Dynamic Environments

Figure 2 for Arena 3.0: Advancing Social Navigation in Collaborative and Highly Dynamic Environments

Figure 3 for Arena 3.0: Advancing Social Navigation in Collaborative and Highly Dynamic Environments

Figure 4 for Arena 3.0: Advancing Social Navigation in Collaborative and Highly Dynamic Environments

Abstract:Building upon our previous contributions, this paper introduces Arena 3.0, an extension of Arena-Bench, Arena 1.0, and Arena 2.0. Arena 3.0 is a comprehensive software stack containing multiple modules and simulation environments focusing on the development, simulation, and benchmarking of social navigation approaches in collaborative environments. We significantly enhance the realism of human behavior simulation by incorporating a diverse array of new social force models and interaction patterns, encompassing both human-human and human-robot dynamics. The platform provides a comprehensive set of new task modes, designed for extensive benchmarking and testing and is capable of generating realistic and human-centric environments dynamically, catering to a broad spectrum of social navigation scenarios. In addition, the platform's functionalities have been abstracted across three widely used simulators, each tailored for specific training and testing purposes. The platform's efficacy has been validated through an extensive benchmark and user evaluations of the platform by a global community of researchers and students, which noted the substantial improvement compared to previous versions and expressed interests to utilize the platform for future research and development. Arena 3.0 is openly available at https://github.com/Arena-Rosnav.

* Robotics Science and Systems 2024, Delft Netherlands
* 11 pages, 6 figures

Via

Access Paper or Ask Questions

Out-of-Distribution Detection with a Single Unconditional Diffusion Model

May 20, 2024

Alvin Heng, Alexandre H. Thiery, Harold Soh

Figure 1 for Out-of-Distribution Detection with a Single Unconditional Diffusion Model

Figure 2 for Out-of-Distribution Detection with a Single Unconditional Diffusion Model

Figure 3 for Out-of-Distribution Detection with a Single Unconditional Diffusion Model

Figure 4 for Out-of-Distribution Detection with a Single Unconditional Diffusion Model

Abstract:Out-of-distribution (OOD) detection is a critical task in machine learning that seeks to identify abnormal samples. Traditionally, unsupervised methods utilize a deep generative model for OOD detection. However, such approaches necessitate a different model when evaluating abnormality against a new distribution. With the emergence of foundational generative models, this paper explores whether a single generalist model can also perform OOD detection across diverse tasks. To that end, we introduce our method, Diffusion Paths, (DiffPath) in this work. DiffPath proposes to utilize a single diffusion model originally trained to perform unconditional generation for OOD detection. Specifically, we introduce a novel technique of measuring the rate-of-change and curvature of the diffusion paths connecting samples to the standard normal. Extensive experiments show that with a single model, DiffPath outperforms prior work on a variety of OOD tasks involving different distributions. Our code is publicly available at https://github.com/clear-nus/diffpath.

Via

Access Paper or Ask Questions

LTLDoG: Satisfying Temporally-Extended Symbolic Constraints for Safe Diffusion-based Planning

May 07, 2024

Zeyu Feng, Hao Luan, Pranav Goyal, Harold Soh

Figure 1 for LTLDoG: Satisfying Temporally-Extended Symbolic Constraints for Safe Diffusion-based Planning

Figure 2 for LTLDoG: Satisfying Temporally-Extended Symbolic Constraints for Safe Diffusion-based Planning

Figure 3 for LTLDoG: Satisfying Temporally-Extended Symbolic Constraints for Safe Diffusion-based Planning

Figure 4 for LTLDoG: Satisfying Temporally-Extended Symbolic Constraints for Safe Diffusion-based Planning

Abstract:Operating effectively in complex environments while complying with specified constraints is crucial for the safe and successful deployment of robots that interact with and operate around people. In this work, we focus on generating long-horizon trajectories that adhere to novel static and temporally-extended constraints/instructions at test time. We propose a data-driven diffusion-based framework, LTLDoG, that modifies the inference steps of the reverse process given an instruction specified using finite linear temporal logic ($\text{LTL}_f$). LTLDoG leverages a satisfaction value function on $\text{LTL}_f$ and guides the sampling steps using its gradient field. This value function can also be trained to generalize to new instructions not observed during training, enabling flexible test-time adaptability. Experiments in robot navigation and manipulation illustrate that the method is able to generate trajectories that satisfy formulae that specify obstacle avoidance and visitation sequences.

Via

Access Paper or Ask Questions

Octopi: Object Property Reasoning with Large Tactile-Language Models

May 05, 2024

Samson Yu, Kelvin Lin, Anxing Xiao, Jiafei Duan, Harold Soh

Figure 1 for Octopi: Object Property Reasoning with Large Tactile-Language Models

Figure 2 for Octopi: Object Property Reasoning with Large Tactile-Language Models

Figure 3 for Octopi: Object Property Reasoning with Large Tactile-Language Models

Figure 4 for Octopi: Object Property Reasoning with Large Tactile-Language Models

Abstract:Physical reasoning is important for effective robot manipulation. Recent work has investigated both vision and language modalities for physical reasoning; vision can reveal information about objects in the environment and language serves as an abstraction and communication medium for additional context. Although these works have demonstrated success on a variety of physical reasoning tasks, they are limited to physical properties that can be inferred from visual or language inputs. In this work, we investigate combining tactile perception with language, which enables embodied systems to obtain physical properties through interaction and apply common-sense reasoning. We contribute a new dataset PhysiCleAR, which comprises both physical/property reasoning tasks and annotated tactile videos obtained using a GelSight tactile sensor. We then introduce Octopi, a system that leverages both tactile representation learning and large vision-language models to predict and reason about tactile inputs with minimal language fine-tuning. Our evaluations on PhysiCleAR show that Octopi is able to effectively use intermediate physical property predictions to improve physical reasoning in both trained tasks and for zero-shot reasoning. PhysiCleAR and Octopi are available on https://github.com/clear-nus/octopi.

* 17 pages

Via

Access Paper or Ask Questions

Extract, Define, Canonicalize: An LLM-based Framework for Knowledge Graph Construction

Apr 05, 2024

Bowen Zhang, Harold Soh

Abstract:In this work, we are interested in automated methods for knowledge graph creation (KGC) from input text. Progress on large language models (LLMs) has prompted a series of recent works applying them to KGC, e.g., via zero/few-shot prompting. Despite successes on small domain-specific datasets, these models face difficulties scaling up to text common in many real-world applications. A principal issue is that in prior methods, the KG schema has to be included in the LLM prompt to generate valid triplets; larger and more complex schema easily exceed the LLMs' context window length. To address this problem, we propose a three-phase framework named Extract-Define-Canonicalize (EDC): open information extraction followed by schema definition and post-hoc canonicalization. EDC is flexible in that it can be applied to settings where a pre-defined target schema is available and when it is not; in the latter case, it constructs a schema automatically and applies self-canonicalization. To further improve performance, we introduce a trained component that retrieves schema elements relevant to the input text; this improves the LLMs' extraction performance in a retrieval-augmented generation-like manner. We demonstrate on three KGC benchmarks that EDC is able to extract high-quality triplets without any parameter tuning and with significantly larger schemas compared to prior works.

* 15 pages, 2 figures

Via

Access Paper or Ask Questions