Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Rui Chen

SPARK: A Modular Benchmark for Humanoid Robot Safety

Feb 05, 2025

Yifan Sun, Rui Chen, Kai S. Yun, Yikuan Fang, Sebin Jung, Feihan Li, Bowei Li, Weiye Zhao, Changliu Liu

Abstract:This paper introduces the Safe Protective and Assistive Robot Kit (SPARK), a comprehensive benchmark designed to ensure safety in humanoid autonomy and teleoperation. Humanoid robots pose significant safety risks due to their physical capabilities of interacting with complex environments. The physical structures of humanoid robots further add complexity to the design of general safety solutions. To facilitate the safe deployment of complex robot systems, SPARK can be used as a toolbox that comes with state-of-the-art safe control algorithms in a modular and composable robot control framework. Users can easily configure safety criteria and sensitivity levels to optimize the balance between safety and performance. To accelerate humanoid safety research and development, SPARK provides a simulation benchmark that compares safety approaches in a variety of environments, tasks, and robot models. Furthermore, SPARK allows quick deployment of synthesized safe controllers on real robots. For hardware deployment, SPARK supports Apple Vision Pro (AVP) or a Motion Capture System as external sensors, while also offering interfaces for seamless integration with alternative hardware setups. This paper demonstrates SPARK's capability with both simulation experiments and case studies with a Unitree G1 humanoid robot. Leveraging these advantages of SPARK, users and researchers can significantly improve the safety of their humanoid systems as well as accelerate relevant research. The open-source code is available at https://github.com/intelligent-control-lab/spark.

Via

Access Paper or Ask Questions

Dexterous Safe Control for Humanoids in Cluttered Environments via Projected Safe Set Algorithm

Feb 05, 2025

Rui Chen, Yifan Sun, Changliu Liu

Abstract:It is critical to ensure safety for humanoid robots in real-world applications without compromising performance. In this paper, we consider the problem of dexterous safety, featuring limb-level geometry constraints for avoiding both external and self-collisions in cluttered environments. Compared to safety with simplified bounding geometries in sprase environments, dexterous safety produces numerous constraints which often lead to infeasible constraint sets when solving for safe robot control. To address this issue, we propose Projected Safe Set Algorithm (p-SSA), an extension of classical safe control algorithms to multi-constraint cases. p-SSA relaxes conflicting constraints in a principled manner, minimizing safety violations to guarantee feasible robot control. We verify our approach in simulation and on a real Unitree G1 humanoid robot performing complex collision avoidance tasks. Results show that p-SSA enables the humanoid to operate robustly in challenging situations with minimal safety violations and directly generalizes to various tasks with zero parameter tuning.

Via

Access Paper or Ask Questions

ThinTact:Thin Vision-Based Tactile Sensor by Lensless Imaging

Jan 16, 2025

Jing Xu, Weihang Chen, Hongyu Qian, Dan Wu, Rui Chen

Figure 1 for ThinTact:Thin Vision-Based Tactile Sensor by Lensless Imaging

Figure 2 for ThinTact:Thin Vision-Based Tactile Sensor by Lensless Imaging

Figure 3 for ThinTact:Thin Vision-Based Tactile Sensor by Lensless Imaging

Figure 4 for ThinTact:Thin Vision-Based Tactile Sensor by Lensless Imaging

Abstract:Vision-based tactile sensors have drawn increasing interest in the robotics community. However, traditional lens-based designs impose minimum thickness constraints on these sensors, limiting their applicability in space-restricted settings. In this paper, we propose ThinTact, a novel lensless vision-based tactile sensor with a sensing field of over 200 mm2 and a thickness of less than 10 mm.ThinTact utilizes the mask-based lensless imaging technique to map the contact information to CMOS signals. To ensure real-time tactile sensing, we propose a real-time lensless reconstruction algorithm that leverages a frequency-spatial-domain joint filter based on discrete cosine transform (DCT). This algorithm achieves computation significantly faster than existing optimization-based methods. Additionally, to improve the sensing quality, we develop a mask optimization method based on the generic algorithm and the corresponding system matrix calibration algorithm.We evaluate the performance of our proposed lensless reconstruction and tactile sensing through qualitative and quantitative experiments. Furthermore, we demonstrate ThinTact's practical applicability in diverse applications, including texture recognition and contact-rich object manipulation. The paper will appear in the IEEE Transactions on Robotics: https://ieeexplore.ieee.org/document/10842357. Video: https://youtu.be/YrOO9BDMAHo

* \c{opyright} 2025 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

Via

Access Paper or Ask Questions

Supervision-free Vision-Language Alignment

Jan 08, 2025

Giorgio Giannone, Ruoteng Li, Qianli Feng, Evgeny Perevodchikov, Rui Chen, Aleix Martinez

Abstract:Vision-language models (VLMs) have demonstrated remarkable potential in integrating visual and linguistic information, but their performance is often constrained by the need for extensive, high-quality image-text training data. Curation of these image-text pairs is both time-consuming and computationally expensive. To address this challenge, we introduce SVP (Supervision-free Visual Projection), a novel framework that enhances vision-language alignment without relying on curated data or preference annotation. SVP leverages self-captioning and a pre-trained grounding model as a feedback mechanism to elicit latent information in VLMs. We evaluate our approach across six key areas: captioning, referring, visual question answering, multitasking, hallucination control, and object recall. Results demonstrate significant improvements, including a 14% average improvement in captioning tasks, up to 12% increase in object recall, and substantial reduction in hallucination rates. Notably, a small VLM using SVP achieves hallucination reductions comparable to a model five times larger, while a VLM with initially poor referring capabilities more than doubles its performance, approaching parity with a model twice its size.

* Preprint

Via

Access Paper or Ask Questions

From Pairwise to Ranking: Climbing the Ladder to Ideal Collaborative Filtering with Pseudo-Ranking

Dec 24, 2024

Yuhan Zhao, Rui Chen, Li Chen, Shuang Zhang, Qilong Han, Hongtao Song

Figure 1 for From Pairwise to Ranking: Climbing the Ladder to Ideal Collaborative Filtering with Pseudo-Ranking

Figure 2 for From Pairwise to Ranking: Climbing the Ladder to Ideal Collaborative Filtering with Pseudo-Ranking

Figure 3 for From Pairwise to Ranking: Climbing the Ladder to Ideal Collaborative Filtering with Pseudo-Ranking

Figure 4 for From Pairwise to Ranking: Climbing the Ladder to Ideal Collaborative Filtering with Pseudo-Ranking

Abstract:Intuitively, an ideal collaborative filtering (CF) model should learn from users' full rankings over all items to make optimal top-K recommendations. Due to the absence of such full rankings in practice, most CF models rely on pairwise loss functions to approximate full rankings, resulting in an immense performance gap. In this paper, we provide a novel analysis using the multiple ordinal classification concept to reveal the inevitable gap between a pairwise approximation and the ideal case. However, bridging the gap in practice encounters two formidable challenges: (1) none of the real-world datasets contains full ranking information; (2) there does not exist a loss function that is capable of consuming ranking information. To overcome these challenges, we propose a pseudo-ranking paradigm (PRP) that addresses the lack of ranking information by introducing pseudo-rankings supervised by an original noise injection mechanism. Additionally, we put forward a new ranking loss function designed to handle ranking information effectively. To ensure our method's robustness against potential inaccuracies in pseudo-rankings, we equip the ranking loss function with a gradient-based confidence mechanism to detect and mitigate abnormal gradients. Extensive experiments on four real-world datasets demonstrate that PRP significantly outperforms state-of-the-art methods.

Via

Access Paper or Ask Questions

Dora: Sampling and Benchmarking for 3D Shape Variational Auto-Encoders

Dec 24, 2024

Rui Chen, Jianfeng Zhang, Yixun Liang, Guan Luo, Weiyu Li, Jiarui Liu, Xiu Li, Xiaoxiao Long, Jiashi Feng, Ping Tan

Figure 1 for Dora: Sampling and Benchmarking for 3D Shape Variational Auto-Encoders

Figure 2 for Dora: Sampling and Benchmarking for 3D Shape Variational Auto-Encoders

Figure 3 for Dora: Sampling and Benchmarking for 3D Shape Variational Auto-Encoders

Figure 4 for Dora: Sampling and Benchmarking for 3D Shape Variational Auto-Encoders

Abstract:Recent 3D content generation pipelines commonly employ Variational Autoencoders (VAEs) to encode shapes into compact latent representations for diffusion-based generation. However, the widely adopted uniform point sampling strategy in Shape VAE training often leads to a significant loss of geometric details, limiting the quality of shape reconstruction and downstream generation tasks. We present Dora-VAE, a novel approach that enhances VAE reconstruction through our proposed sharp edge sampling strategy and a dual cross-attention mechanism. By identifying and prioritizing regions with high geometric complexity during training, our method significantly improves the preservation of fine-grained shape features. Such sampling strategy and the dual attention mechanism enable the VAE to focus on crucial geometric details that are typically missed by uniform sampling approaches. To systematically evaluate VAE reconstruction quality, we additionally propose Dora-bench, a benchmark that quantifies shape complexity through the density of sharp edges, introducing a new metric focused on reconstruction accuracy at these salient geometric features. Extensive experiments on the Dora-bench demonstrate that Dora-VAE achieves comparable reconstruction quality to the state-of-the-art dense XCube-VAE while requiring a latent space at least 8$\times$ smaller (1,280 vs. > 10,000 codes). We will release our code and benchmark dataset to facilitate future research in 3D shape modeling.

* Project page: https://aruichen.github.io/Dora/

Via

Access Paper or Ask Questions

Unlocking the Hidden Treasures: Enhancing Recommendations with Unlabeled Data

Dec 24, 2024

Yuhan Zhao, Rui Chen, Qilong Han, Hongtao Song, Li Chen

Figure 1 for Unlocking the Hidden Treasures: Enhancing Recommendations with Unlabeled Data

Figure 2 for Unlocking the Hidden Treasures: Enhancing Recommendations with Unlabeled Data

Figure 3 for Unlocking the Hidden Treasures: Enhancing Recommendations with Unlabeled Data

Figure 4 for Unlocking the Hidden Treasures: Enhancing Recommendations with Unlabeled Data

Abstract:Collaborative filtering (CF) stands as a cornerstone in recommender systems, yet effectively leveraging the massive unlabeled data presents a significant challenge. Current research focuses on addressing the challenge of unlabeled data by extracting a subset that closely approximates negative samples. Regrettably, the remaining data are overlooked, failing to fully integrate this valuable information into the construction of user preferences. To address this gap, we introduce a novel positive-neutral-negative (PNN) learning paradigm. PNN introduces a neutral class, encompassing intricate items that are challenging to categorize directly as positive or negative samples. By training a model based on this triple-wise partial ranking, PNN offers a promising solution to learning complex user preferences. Through theoretical analysis, we connect PNN to one-way partial AUC (OPAUC) to validate its efficacy. Implementing the PNN paradigm is, however, technically challenging because: (1) it is difficult to classify unlabeled data into neutral or negative in the absence of supervised signals; (2) there does not exist any loss function that can handle set-level triple-wise ranking relationships. To address these challenges, we propose a semi-supervised learning method coupled with a user-aware attention model for knowledge acquisition and classification refinement. Additionally, a novel loss function with a two-step centroid ranking approach enables handling set-level rankings. Extensive experiments on four real-world datasets demonstrate that, when combined with PNN, a wide range of representative CF models can consistently and significantly boost their performance. Even with a simple matrix factorization, PNN can achieve comparable performance to sophisticated graph neutral networks.

Via

Access Paper or Ask Questions

Balanced 3DGS: Gaussian-wise Parallelism Rendering with Fine-Grained Tiling

Dec 23, 2024

Hao Gui, Lin Hu, Rui Chen, Mingxiao Huang, Yuxin Yin, Jin Yang, Yong Wu

Figure 1 for Balanced 3DGS: Gaussian-wise Parallelism Rendering with Fine-Grained Tiling

Figure 2 for Balanced 3DGS: Gaussian-wise Parallelism Rendering with Fine-Grained Tiling

Figure 3 for Balanced 3DGS: Gaussian-wise Parallelism Rendering with Fine-Grained Tiling

Figure 4 for Balanced 3DGS: Gaussian-wise Parallelism Rendering with Fine-Grained Tiling

Abstract:3D Gaussian Splatting (3DGS) is increasingly attracting attention in both academia and industry owing to its superior visual quality and rendering speed. However, training a 3DGS model remains a time-intensive task, especially in load imbalance scenarios where workload diversity among pixels and Gaussian spheres causes poor renderCUDA kernel performance. We introduce Balanced 3DGS, a Gaussian-wise parallelism rendering with fine-grained tiling approach in 3DGS training process, perfectly solving load-imbalance issues. First, we innovatively introduce the inter-block dynamic workload distribution technique to map workloads to Streaming Multiprocessor(SM) resources within a single GPU dynamically, which constitutes the foundation of load balancing. Second, we are the first to propose the Gaussian-wise parallel rendering technique to significantly reduce workload divergence inside a warp, which serves as a critical component in addressing load imbalance. Based on the above two methods, we further creatively put forward the fine-grained combined load balancing technique to uniformly distribute workload across all SMs, which boosts the forward renderCUDA kernel performance by up to 7.52x. Besides, we present a self-adaptive render kernel selection strategy during the 3DGS training process based on different load-balance situations, which effectively improves training efficiency.

Via

Access Paper or Ask Questions

Fast inverse lithography based on a model-driven block stacking convolutional neural network

Dec 19, 2024

Ruixiang Chen, Yang Zhao, Haoqin Li, Rui Chen

Abstract:In the realm of lithography, Optical Proximity Correction (OPC) is a crucial resolution enhancement technique that optimizes the transmission function of photomasks on a pixel-based to effectively counter Optical Proximity Effects (OPE). However, conventional pixel-based OPC methods often generate patterns that pose manufacturing challenges, thereby leading to the increased cost in practical scenarios. This paper presents a novel inverse lithographic approach to OPC, employing a model-driven, block stacking deep learning framework that expedites the generation of masks conducive to manufacturing. This method is founded on vector lithography modelling and streamlines the training process by eliminating the requirement for extensive labeled datasets. Furthermore, diversity of mask patterns is enhanced by employing a wave function collapse algorithm, which facilitates the random generation of a multitude of target patterns, therefore significantly expanding the range of mask paradigm. Numerical experiments have substantiated the efficacy of the proposed end-to-end approach, highlighting its superior capability to manage mask complexity within the context of advanced OPC lithography. This advancement is anticipated to enhance the feasibility and economic viability of OPC technology within actual manufacturing environments.

* 21 pages, 7 figures

Via

Access Paper or Ask Questions

UniMLVG: Unified Framework for Multi-view Long Video Generation with Comprehensive Control Capabilities for Autonomous Driving

Dec 06, 2024

Rui Chen, Zehuan Wu, Yichen Liu, Yuxin Guo, Jingcheng Ni, Haifeng Xia, Siyu Xia

Abstract:The creation of diverse and realistic driving scenarios has become essential to enhance perception and planning capabilities of the autonomous driving system. However, generating long-duration, surround-view consistent driving videos remains a significant challenge. To address this, we present UniMLVG, a unified framework designed to generate extended street multi-perspective videos under precise control. By integrating single- and multi-view driving videos into the training data, our approach updates cross-frame and cross-view modules across three stages with different training objectives, substantially boosting the diversity and quality of generated visual content. Additionally, we employ the explicit viewpoint modeling in multi-view video generation to effectively improve motion transition consistency. Capable of handling various input reference formats (e.g., text, images, or video), our UniMLVG generates high-quality multi-view videos according to the corresponding condition constraints such as 3D bounding boxes or frame-level text descriptions. Compared to the best models with similar capabilities, our framework achieves improvements of 21.4% in FID and 36.5% in FVD.

Via

Access Paper or Ask Questions