Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Nanning Zheng

Xi'an Jiaotong University

Skill-Based Few-Shot Selection for In-Context Learning

May 23, 2023

Shengnan An, Bo Zhou, Zeqi Lin, Qiang Fu, Bei Chen, Nanning Zheng, Weizhu Chen, Jian-Guang Lou

Figure 1 for Skill-Based Few-Shot Selection for In-Context Learning

Figure 2 for Skill-Based Few-Shot Selection for In-Context Learning

Figure 3 for Skill-Based Few-Shot Selection for In-Context Learning

Figure 4 for Skill-Based Few-Shot Selection for In-Context Learning

Abstract:In-Context learning is the paradigm that adapts large language models to downstream tasks by providing a few examples. Few-shot selection -- selecting appropriate examples for each test instance separately -- is important for in-context learning. In this paper, we propose Skill-KNN, a skill-based few-shot selection method for in-context learning. The key advantages of Skill-KNN include: (1) it addresses the problem that existing methods based on pre-trained embeddings can be easily biased by surface natural language features that are not important for the target task; (2) it does not require training or fine-tuning of any models, making it suitable for frequently expanding or changing example banks. The key insight is to optimize the inputs fed into the embedding model, rather than tuning the model itself. Technically, Skill-KNN generates the skill-based representations for each test case and candidate example by utilizing a pre-processing few-shot prompting, thus eliminating unimportant surface features. Experimental results across four cross-domain semantic parsing tasks and four backbone models show that Skill-KNN significantly outperforms existing methods.

* 18 pages, 6 figures

Via

Access Paper or Ask Questions

Milestones in Autonomous Driving and Intelligent Vehicles Part \uppercase\expandafter{\romannumeral1}: Control, Computing System Design, Communication, HD Map, Testing, and Human Behaviors

May 12, 2023

Long Chen, Yuchen Li, Chao Huang, Yang Xing, Daxin Tian, Li Li, Zhongxu Hu, Siyu Teng, Chen Lv, Jinjun Wang(+3 more)

$Figure 1 for Milestones in Autonomous Driving and Intelligent Vehicles Part \uppercase\expandafter{\romannumeral1}: Control, Computing System Design, Communication, HD Map, Testing, and Human Behaviors$

$Figure 2 for Milestones in Autonomous Driving and Intelligent Vehicles Part \uppercase\expandafter{\romannumeral1}: Control, Computing System Design, Communication, HD Map, Testing, and Human Behaviors$

$Figure 3 for Milestones in Autonomous Driving and Intelligent Vehicles Part \uppercase\expandafter{\romannumeral1}: Control, Computing System Design, Communication, HD Map, Testing, and Human Behaviors$

$Figure 4 for Milestones in Autonomous Driving and Intelligent Vehicles Part \uppercase\expandafter{\romannumeral1}: Control, Computing System Design, Communication, HD Map, Testing, and Human Behaviors$

Abstract:Interest in autonomous driving (AD) and intelligent vehicles (IVs) is growing at a rapid pace due to the convenience, safety, and economic benefits. Although a number of surveys have reviewed research achievements in this field, they are still limited in specific tasks and lack systematic summaries and research directions in the future. Our work is divided into 3 independent articles and the first part is a Survey of Surveys (SoS) for total technologies of AD and IVs that involves the history, summarizes the milestones, and provides the perspectives, ethics, and future research directions. This is the second part (Part \uppercase\expandafter{\romannumeral1} for this technical survey) to review the development of control, computing system design, communication, High Definition map (HD map), testing, and human behaviors in IVs. In addition, the third part (Part \uppercase\expandafter{\romannumeral2} for this technical survey) is to review the perception and planning sections. The objective of this paper is to involve all the sections of AD, summarize the latest technical milestones, and guide abecedarians to quickly understand the development of AD and IVs. Combining the SoS and Part \uppercase\expandafter{\romannumeral2}, we anticipate that this work will bring novel and diverse insights to researchers and abecedarians, and serve as a bridge between past and future.

* IEEE SMC: Systems
* 18 pages, 4 figures, 3 tables

Via

Access Paper or Ask Questions

How Do In-Context Examples Affect Compositional Generalization?

May 08, 2023

Shengnan An, Zeqi Lin, Qiang Fu, Bei Chen, Nanning Zheng, Jian-Guang Lou, Dongmei Zhang

Abstract:Compositional generalization--understanding unseen combinations of seen primitives--is an essential reasoning capability in human intelligence. The AI community mainly studies this capability by fine-tuning neural networks on lots of training samples, while it is still unclear whether and how in-context learning--the prevailing few-shot paradigm based on large language models--exhibits compositional generalization. In this paper, we present CoFe, a test suite to investigate in-context compositional generalization. We find that the compositional generalization performance can be easily affected by the selection of in-context examples, thus raising the research question what the key factors are to make good in-context examples for compositional generalization. We study three potential factors: similarity, diversity and complexity. Our systematic experiments indicate that in-context examples should be structurally similar to the test case, diverse from each other, and individually simple. Furthermore, two strong limitations are observed: in-context compositional generalization on fictional words is much weaker than that on commonly used ones; it is still critical that the in-context examples should cover required linguistic structures, even though the backbone model has been pre-trained on large corpus. We hope our analysis would facilitate the understanding and utilization of in-context learning paradigm.

* ACL 2023, long paper

Via

Access Paper or Ask Questions

Learning Trajectories are Generalization Indicators

May 04, 2023

Jingwen Fu, Zhizheng Zhang, Dacheng Yin, Yan Lu, Nanning Zheng

Figure 1 for Learning Trajectories are Generalization Indicators

Figure 2 for Learning Trajectories are Generalization Indicators

Figure 3 for Learning Trajectories are Generalization Indicators

Figure 4 for Learning Trajectories are Generalization Indicators

Abstract:The aim of this paper is to investigate the connection between learning trajectories of the Deep Neural Networks (DNNs) and their corresponding generalization capabilities when being optimized with broadly used gradient descent and stochastic gradient descent algorithms. In this paper, we construct Linear Approximation Function to model the trajectory information and we propose a new generalization bound with richer trajectory information based on it. Our proposed generalization bound relies on the complexity of learning trajectory and the ratio between the bias and diversity of training set. Experimental results indicate that the proposed method effectively captures the generalization trend across various training steps, learning rates, and label noise levels.

Via

Access Paper or Ask Questions

Quadric Representations for LiDAR Odometry, Mapping and Localization

Apr 27, 2023

Chao Xia, Chenfeng Xu, Patrick Rim, Mingyu Ding, Nanning Zheng, Kurt Keutzer, Masayoshi Tomizuka, Wei Zhan

Figure 1 for Quadric Representations for LiDAR Odometry, Mapping and Localization

Figure 2 for Quadric Representations for LiDAR Odometry, Mapping and Localization

Figure 3 for Quadric Representations for LiDAR Odometry, Mapping and Localization

Figure 4 for Quadric Representations for LiDAR Odometry, Mapping and Localization

Abstract:Current LiDAR odometry, mapping and localization methods leverage point-wise representations of 3D scenes and achieve high accuracy in autonomous driving tasks. However, the space-inefficiency of methods that use point-wise representations limits their development and usage in practical applications. In particular, scan-submap matching and global map representation methods are restricted by the inefficiency of nearest neighbor searching (NNS) for large-volume point clouds. To improve space-time efficiency, we propose a novel method of describing scenes using quadric surfaces, which are far more compact representations of 3D objects than conventional point clouds. In contrast to point cloud-based methods, our quadric representation-based method decomposes a 3D scene into a collection of sparse quadric patches, which improves storage efficiency and avoids the slow point-wise NNS process. Our method first segments a given point cloud into patches and fits each of them to a quadric implicit function. Each function is then coupled with other geometric descriptors of the patch, such as its center position and covariance matrix. Collectively, these patch representations fully describe a 3D scene, which can be used in place of the original point cloud and employed in LiDAR odometry, mapping and localization algorithms. We further design a novel incremental growing method for quadric representations, which eliminates the need to repeatedly re-fit quadric surfaces from the original point cloud. Extensive odometry, mapping and localization experiments on large-volume point clouds in the KITTI and UrbanLoco datasets demonstrate that our method maintains low latency and memory utility while achieving competitive, and even superior, accuracy.

Via

Access Paper or Ask Questions

MMRDN: Consistent Representation for Multi-View Manipulation Relationship Detection in Object-Stacked Scenes

Apr 25, 2023

Han Wang, Jiayuan Zhang, Lipeng Wan, Xingyu Chen, Xuguang Lan, Nanning Zheng

Figure 1 for MMRDN: Consistent Representation for Multi-View Manipulation Relationship Detection in Object-Stacked Scenes

Figure 2 for MMRDN: Consistent Representation for Multi-View Manipulation Relationship Detection in Object-Stacked Scenes

Figure 3 for MMRDN: Consistent Representation for Multi-View Manipulation Relationship Detection in Object-Stacked Scenes

Figure 4 for MMRDN: Consistent Representation for Multi-View Manipulation Relationship Detection in Object-Stacked Scenes

Abstract:Manipulation relationship detection (MRD) aims to guide the robot to grasp objects in the right order, which is important to ensure the safety and reliability of grasping in object stacked scenes. Previous works infer manipulation relationship by deep neural network trained with data collected from a predefined view, which has limitation in visual dislocation in unstructured environments. Multi-view data provide more comprehensive information in space, while a challenge of multi-view MRD is domain shift. In this paper, we propose a novel multi-view fusion framework, namely multi-view MRD network (MMRDN), which is trained by 2D and 3D multi-view data. We project the 2D data from different views into a common hidden space and fit the embeddings with a set of Von-Mises-Fisher distributions to learn the consistent representations. Besides, taking advantage of position information within the 3D data, we select a set of $K$ Maximum Vertical Neighbors (KMVN) points from the point cloud of each object pair, which encodes the relative position of these two objects. Finally, the features of multi-view 2D and 3D data are concatenated to predict the pairwise relationship of objects. Experimental results on the challenging REGRAD dataset show that MMRDN outperforms the state-of-the-art methods in multi-view MRD tasks. The results also demonstrate that our model trained by synthetic data is capable to transfer to real-world scenarios.

Via

Access Paper or Ask Questions

VPFusion: Towards Robust Vertical Representation Learning for 3D Object Detection

Apr 06, 2023

Yuhao Huang, Sanping Zhou, Junjie Zhang, Jinpeng Dong, Nanning Zheng

Abstract:Efficient point cloud representation is a fundamental element of Lidar-based 3D object detection. Recent grid-based detectors usually divide point clouds into voxels or pillars and construct single-stream networks in Bird's Eye View. However, these point cloud encoding paradigms underestimate the point representation in the vertical direction, which cause the loss of semantic or fine-grained information, especially for vertical sensitive objects like pedestrian and cyclists. In this paper, we propose an explicit vertical multi-scale representation learning framework, VPFusion, to combine the complementary information from both voxel and pillar streams. Specifically, VPFusion first builds upon a sparse voxel-pillar-based backbone. The backbone divides point clouds into voxels and pillars, then encodes features with 3D and 2D sparse convolution simultaneously. Next, we introduce the Sparse Fusion Layer (SFL), which establishes a bidirectional pathway for sparse voxel and pillar features to enable the interaction between them. Additionally, we present the Dense Fusion Neck (DFN) to effectively combine the dense feature maps from voxel and pillar branches with multi-scale. Extensive experiments on the large-scale Waymo Open Dataset and nuScenes Dataset demonstrate that VPFusion surpasses the single-stream baselines by a large margin and achieves state-of-the-art performance with real-time inference speed.

Via

Access Paper or Ask Questions

Milestones in Autonomous Driving and Intelligent Vehicles: Survey of Surveys

Mar 30, 2023

Long Chen, Yuchen Li, Chao Huang, Bai Li, Yang Xing, Daxin Tian, Li Li, Zhongxu Hu, Xiaoxiang Na, Zixuan Li(+6 more)

Figure 1 for Milestones in Autonomous Driving and Intelligent Vehicles: Survey of Surveys

Figure 2 for Milestones in Autonomous Driving and Intelligent Vehicles: Survey of Surveys

Figure 3 for Milestones in Autonomous Driving and Intelligent Vehicles: Survey of Surveys

Figure 4 for Milestones in Autonomous Driving and Intelligent Vehicles: Survey of Surveys

* IEEE Transactions on Intelligent Vehicles, vol. 8, no. 2, pp. 1046-1056, Feb. 2023
* 13 pages, 3 tables, 0 figure

Via

Access Paper or Ask Questions

Prioritized Planning for Target-Oriented Manipulation via Hierarchical Stacking Relationship Prediction

Mar 14, 2023

Zewen Wu, Jian Tang, Xingyu Chen, Chengzhong Ma, Xuguang Lan, Nanning Zheng

Abstract:In scenarios involving the grasping of multiple targets, the learning of stacking relationships between objects is fundamental for robots to execute safely and efficiently. However, current methods lack subdivision for the hierarchy of stacking relationship types. In scenes where objects are mostly stacked in an orderly manner, they are incapable of performing human-like and high-efficient grasping decisions. This paper proposes a perception-planning method to distinguish different stacking types between objects and generate prioritized manipulation order decisions based on given target designations. We utilize a Hierarchical Stacking Relationship Network (HSRN) to discriminate the hierarchy of stacking and generate a refined Stacking Relationship Tree (SRT) for relationship description. Considering that objects with high stacking stability can be grasped together if necessary, we introduce an elaborate decision-making planner based on the Partially Observable Markov Decision Process (POMDP), which leverages observations and generates the least grasp-consuming decision chain with robustness and is suitable for simultaneously specifying multiple targets. To verify our work, we set the scene to the dining table and augment the REGRAD dataset with a set of common tableware models for network training. Experiments show that our method effectively generates grasping decisions that conform to human requirements, and improves the implementation efficiency compared with existing methods on the basis of guaranteeing the success rate.

* 8 pages, 8 figures

Via

Access Paper or Ask Questions

MixPHM: Redundancy-Aware Parameter-Efficient Tuning for Low-Resource Visual Question Answering

Mar 02, 2023

Jingjing Jiang, Nanning Zheng

Abstract:Recently, finetuning pretrained vision-language models (VLMs) has become one prevailing paradigm to achieve state-of-the-art performance in VQA. However, as VLMs scale, it becomes computationally expensive, storage inefficient, and prone to overfitting to tune full model parameters for a specific task in low-resource settings. Although current parameter-efficient tuning methods dramatically reduce the number of tunable parameters, there still exists a significant performance gap with full finetuning. In this paper, we propose \textbf{MixPHM}, a redundancy-aware parameter-efficient tuning method that outperforms full finetuning in low-resource VQA. Specifically, MixPHM is a lightweight module implemented by multiple PHM-experts in a mixture-of-experts manner. To reduce parameter redundancy, we reparameterize expert weights in a low-rank subspace and share part of the weights inside and across MixPHM. Moreover, based on our quantitative analysis of representation redundancy, we propose \textbf{redundancy regularization}, which facilitates MixPHM to reduce task-irrelevant redundancy while promoting task-relevant correlation. Experiments conducted on VQA v2, GQA, and OK-VQA with different low-resource settings show that our MixPHM outperforms state-of-the-art parameter-efficient methods and is the only one consistently surpassing full finetuning.

* 14 pages, 6 figures, 9 tables. Accepted by CVPR 2023. Code will be available at \url{https://github.com/jingjing12110/MixPHM}

Via

Access Paper or Ask Questions