Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zhiqiang Tang

Origami-Inspired Soft Gripper with Tunable Constant Force Output

Mar 03, 2025

Zhenwei Ni, Chang Xu, Zhihang Qin, Ceng Zhang, Zhiqiang Tang, Peiyi Wang, Cecilia Laschi

Abstract:Soft robotic grippers gently and safely manipulate delicate objects due to their inherent adaptability and softness. Limited by insufficient stiffness and imprecise force control, conventional soft grippers are not suitable for applications that require stable grasping force. In this work, we propose a soft gripper that utilizes an origami-inspired structure to achieve tunable constant force output over a wide strain range. The geometry of each taper panel is established to provide necessary parameters such as protrusion distance, taper angle, and crease thickness required for 3D modeling and FEA analysis. Simulations and experiments show that by optimizing these parameters, our design can achieve a tunable constant force output. Moreover, the origami-inspired soft gripper dynamically adapts to different shapes while preventing excessive forces, with potential applications in logistics, manufacturing, and other industrial settings that require stable and adaptive operations

* 7 pages, 8 figures, conference

Via

Access Paper or Ask Questions

Bag of Tricks for Multimodal AutoML with Image, Text, and Tabular Data

Dec 19, 2024

Zhiqiang Tang, Zihan Zhong, Tong He, Gerald Friedland

Figure 1 for Bag of Tricks for Multimodal AutoML with Image, Text, and Tabular Data

Figure 2 for Bag of Tricks for Multimodal AutoML with Image, Text, and Tabular Data

Figure 3 for Bag of Tricks for Multimodal AutoML with Image, Text, and Tabular Data

Figure 4 for Bag of Tricks for Multimodal AutoML with Image, Text, and Tabular Data

Abstract:This paper studies the best practices for automatic machine learning (AutoML). While previous AutoML efforts have predominantly focused on unimodal data, the multimodal aspect remains under-explored. Our study delves into classification and regression problems involving flexible combinations of image, text, and tabular data. We curate a benchmark comprising 22 multimodal datasets from diverse real-world applications, encompassing all 4 combinations of the 3 modalities. Across this benchmark, we scrutinize design choices related to multimodal fusion strategies, multimodal data augmentation, converting tabular data into text, cross-modal alignment, and handling missing modalities. Through extensive experimentation and analysis, we distill a collection of effective strategies and consolidate them into a unified pipeline, achieving robust performance on diverse datasets.

Via

Access Paper or Ask Questions

Octopus-Swimming-Like Robot with Soft Asymmetric Arms

Oct 15, 2024

Bobing Zhang, Yiyuan Zhang, Yiming Li, Sicheng Xuan, Hong Wei Ng, Yuliang Liufu, Zhiqiang Tang, Cecilia Laschi

Figure 1 for Octopus-Swimming-Like Robot with Soft Asymmetric Arms

Figure 2 for Octopus-Swimming-Like Robot with Soft Asymmetric Arms

Figure 3 for Octopus-Swimming-Like Robot with Soft Asymmetric Arms

Figure 4 for Octopus-Swimming-Like Robot with Soft Asymmetric Arms

Abstract:Underwater vehicles have seen significant development over the past seventy years. However, bio-inspired propulsion robots are still in their early stages and require greater interdisciplinary collaboration between biologists and roboticists. The octopus, one of the most intelligent marine animals, exhibits remarkable abilities such as camouflaging, exploring, and hunting while swimming with its arms. Although bio-inspired robotics researchers have aimed to replicate these abilities, the complexity of designing an eight-arm bionic swimming platform has posed challenges from the beginning. In this work, we propose a novel bionic robot swimming platform that combines asymmetric passive morphing arms with an umbrella-like quick-return mechanism. Using only two simple constant-speed motors, this design achieves efficient swimming by replicating octopus-like arm movements and stroke time ratios. The robot reached a peak speed of 314 mm/s during its second power stroke. This design reduces the complexity of traditional octopus-like swimming robot actuation systems while maintaining good swimming performance. It offers a more achievable and efficient platform for biologists and roboticists conducting more profound octopus-inspired robotic and biological studies.

Via

Access Paper or Ask Questions

Learning to Generate Answers with Citations via Factual Consistency Models

Jun 19, 2024

Rami Aly, Zhiqiang Tang, Samson Tan, George Karypis

Abstract:Large Language Models (LLMs) frequently hallucinate, impeding their reliability in mission-critical situations. One approach to address this issue is to provide citations to relevant sources alongside generated content, enhancing the verifiability of generations. However, citing passages accurately in answers remains a substantial challenge. This paper proposes a weakly-supervised fine-tuning method leveraging factual consistency models (FCMs). Our approach alternates between generating texts with citations and supervised fine-tuning with FCM-filtered citation data. Focused learning is integrated into the objective, directing the fine-tuning process to emphasise the factual unit tokens, as measured by an FCM. Results on the ALCE few-shot citation benchmark with various instruction-tuned LLMs demonstrate superior performance compared to in-context learning, vanilla supervised fine-tuning, and state-of-the-art methods, with an average improvement of $34.1$, $15.5$, and $10.5$ citation F$_1$ points, respectively. Moreover, in a domain transfer setting we show that the obtained citation generation ability robustly transfers to unseen datasets. Notably, our citation improvements contribute to the lowest factual error rate across baselines.

* Accepted to ACL 2024. Code release will follow

Via

Access Paper or Ask Questions

AutoGluon-Multimodal (AutoMM): Supercharging Multimodal AutoML with Foundation Models

Apr 30, 2024

Zhiqiang Tang, Haoyang Fang, Su Zhou, Taojiannan Yang, Zihan Zhong, Tony Hu, Katrin Kirchhoff, George Karypis

Figure 1 for AutoGluon-Multimodal (AutoMM): Supercharging Multimodal AutoML with Foundation Models

Figure 2 for AutoGluon-Multimodal (AutoMM): Supercharging Multimodal AutoML with Foundation Models

Figure 3 for AutoGluon-Multimodal (AutoMM): Supercharging Multimodal AutoML with Foundation Models

Figure 4 for AutoGluon-Multimodal (AutoMM): Supercharging Multimodal AutoML with Foundation Models

Abstract:AutoGluon-Multimodal (AutoMM) is introduced as an open-source AutoML library designed specifically for multimodal learning. Distinguished by its exceptional ease of use, AutoMM enables fine-tuning of foundation models with just three lines of code. Supporting various modalities including image, text, and tabular data, both independently and in combination, the library offers a comprehensive suite of functionalities spanning classification, regression, object detection, semantic matching, and image segmentation. Experiments across diverse datasets and tasks showcases AutoMM's superior performance in basic classification and regression tasks compared to existing AutoML tools, while also demonstrating competitive results in advanced tasks, aligning with specialized toolboxes designed for such purposes.

* Accepted at AutoML 2024 Conference

Via

Access Paper or Ask Questions

Convolution Meets LoRA: Parameter Efficient Finetuning for Segment Anything Model

Jan 31, 2024

Zihan Zhong, Zhiqiang Tang, Tong He, Haoyang Fang, Chun Yuan

Figure 1 for Convolution Meets LoRA: Parameter Efficient Finetuning for Segment Anything Model

Figure 2 for Convolution Meets LoRA: Parameter Efficient Finetuning for Segment Anything Model

Figure 3 for Convolution Meets LoRA: Parameter Efficient Finetuning for Segment Anything Model

Figure 4 for Convolution Meets LoRA: Parameter Efficient Finetuning for Segment Anything Model

Abstract:The Segment Anything Model (SAM) stands as a foundational framework for image segmentation. While it exhibits remarkable zero-shot generalization in typical scenarios, its advantage diminishes when applied to specialized domains like medical imagery and remote sensing. To address this limitation, this paper introduces Conv-LoRA, a simple yet effective parameter-efficient fine-tuning approach. By integrating ultra-lightweight convolutional parameters into Low-Rank Adaptation (LoRA), Conv-LoRA can inject image-related inductive biases into the plain ViT encoder, further reinforcing SAM's local prior assumption. Notably, Conv-LoRA not only preserves SAM's extensive segmentation knowledge but also revives its capacity of learning high-level image semantics, which is constrained by SAM's foreground-background segmentation pretraining. Comprehensive experimentation across diverse benchmarks spanning multiple domains underscores Conv-LoRA's superiority in adapting SAM to real-world semantic segmentation tasks.

* Accepted at ICLR 2024 Conference

Via

Access Paper or Ask Questions

Learning Multimodal Data Augmentation in Feature Space

Dec 29, 2022

Zichang Liu, Zhiqiang Tang, Xingjian Shi, Aston Zhang, Mu Li, Anshumali Shrivastava, Andrew Gordon Wilson

Figure 1 for Learning Multimodal Data Augmentation in Feature Space

Figure 2 for Learning Multimodal Data Augmentation in Feature Space

Figure 3 for Learning Multimodal Data Augmentation in Feature Space

Figure 4 for Learning Multimodal Data Augmentation in Feature Space

Abstract:The ability to jointly learn from multiple modalities, such as text, audio, and visual data, is a defining feature of intelligent systems. While there have been promising advances in designing neural networks to harness multimodal data, the enormous success of data augmentation currently remains limited to single-modality tasks like image classification. Indeed, it is particularly difficult to augment each modality while preserving the overall semantic structure of the data; for example, a caption may no longer be a good description of an image after standard augmentations have been applied, such as translation. Moreover, it is challenging to specify reasonable transformations that are not tailored to a particular modality. In this paper, we introduce LeMDA, Learning Multimodal Data Augmentation, an easy-to-use method that automatically learns to jointly augment multimodal data in feature space, with no constraints on the identities of the modalities or the relationship between modalities. We show that LeMDA can (1) profoundly improve the performance of multimodal deep learning architectures, (2) apply to combinations of modalities that have not been previously considered, and (3) achieve state-of-the-art results on a wide range of applications comprised of image, text, and tabular data.

Via

Access Paper or Ask Questions

Are Multimodal Models Robust to Image and Text Perturbations?

Dec 15, 2022

Jielin Qiu, Yi Zhu, Xingjian Shi, Florian Wenzel, Zhiqiang Tang, Ding Zhao, Bo Li, Mu Li

Figure 1 for Are Multimodal Models Robust to Image and Text Perturbations?

Figure 2 for Are Multimodal Models Robust to Image and Text Perturbations?

Figure 3 for Are Multimodal Models Robust to Image and Text Perturbations?

Figure 4 for Are Multimodal Models Robust to Image and Text Perturbations?

Abstract:Multimodal image-text models have shown remarkable performance in the past few years. However, evaluating their robustness against distribution shifts is crucial before adopting them in real-world applications. In this paper, we investigate the robustness of 9 popular open-sourced image-text models under common perturbations on five tasks (image-text retrieval, visual reasoning, visual entailment, image captioning, and text-to-image generation). In particular, we propose several new multimodal robustness benchmarks by applying 17 image perturbation and 16 text perturbation techniques on top of existing datasets. We observe that multimodal models are not robust to image and text perturbations, especially to image perturbations. Among the tested perturbation methods, character-level perturbations constitute the most severe distribution shift for text, and zoom blur is the most severe shift for image data. We also introduce two new robustness metrics (MMI and MOR) for proper evaluations of multimodal models. We hope our extensive study sheds light on new directions for the development of robust multimodal models.

* The project webpage is at: https://mmrobustness.github.io/

Via

Access Paper or Ask Questions

Visual Prompt Tuning for Test-time Domain Adaptation

Oct 10, 2022

Yunhe Gao, Xingjian Shi, Yi Zhu, Hao Wang, Zhiqiang Tang, Xiong Zhou, Mu Li, Dimitris N. Metaxas

Figure 1 for Visual Prompt Tuning for Test-time Domain Adaptation

Figure 2 for Visual Prompt Tuning for Test-time Domain Adaptation

Figure 3 for Visual Prompt Tuning for Test-time Domain Adaptation

Figure 4 for Visual Prompt Tuning for Test-time Domain Adaptation

Abstract:Models should have the ability to adapt to unseen data during test-time to avoid performance drop caused by inevitable distribution shifts in real-world deployment scenarios. In this work, we tackle the practical yet challenging test-time adaptation (TTA) problem, where a model adapts to the target domain without accessing the source data. We propose a simple recipe called data-efficient prompt tuning (DePT) with two key ingredients. First, DePT plugs visual prompts into the vision Transformer and only tunes these source-initialized prompts during adaptation. We find such parameter-efficient finetuning can efficiently adapt the model representation to the target domain without overfitting to the noise in the learning objective. Second, DePT bootstraps the source representation to the target domain by memory bank-based online pseudo labeling. A hierarchical self-supervised regularization specially designed for prompts is jointly optimized to alleviate error accumulation during self-training. With much fewer tunable parameters, DePT demonstrates not only state-of-the-art performance on major adaptation benchmarks, but also superior data efficiency, i.e., adaptation with only 1\% or 10\% data without much performance degradation compared to 100\% data. In addition, DePT is also versatile to be extended to online or multi-source TTA settings.

Via

Access Paper or Ask Questions

Enabling Data Diversity: Efficient Automatic Augmentation via Regularized Adversarial Training

Mar 30, 2021

Yunhe Gao, Zhiqiang Tang, Mu Zhou, Dimitris Metaxas

Figure 1 for Enabling Data Diversity: Efficient Automatic Augmentation via Regularized Adversarial Training

Figure 2 for Enabling Data Diversity: Efficient Automatic Augmentation via Regularized Adversarial Training

Figure 3 for Enabling Data Diversity: Efficient Automatic Augmentation via Regularized Adversarial Training

Figure 4 for Enabling Data Diversity: Efficient Automatic Augmentation via Regularized Adversarial Training

Abstract:Data augmentation has proved extremely useful by increasing training data variance to alleviate overfitting and improve deep neural networks' generalization performance. In medical image analysis, a well-designed augmentation policy usually requires much expert knowledge and is difficult to generalize to multiple tasks due to the vast discrepancies among pixel intensities, image appearances, and object shapes in different medical tasks. To automate medical data augmentation, we propose a regularized adversarial training framework via two min-max objectives and three differentiable augmentation models covering affine transformation, deformation, and appearance changes. Our method is more automatic and efficient than previous automatic augmentation methods, which still rely on pre-defined operations with human-specified ranges and costly bi-level optimization. Extensive experiments demonstrated that our approach, with less training overhead, achieves superior performance over state-of-the-art auto-augmentation methods on both tasks of 2D skin cancer classification and 3D organs-at-risk segmentation.

* Accepted by IPMI 2021

Via

Access Paper or Ask Questions