Alert button
Picture for Lu Li

Lu Li

Alert button

M2HGCL: Multi-Scale Meta-Path Integrated Heterogeneous Graph Contrastive Learning

Sep 03, 2023
Yuanyuan Guo, Yu Xia, Rui Wang, Rongcheng Duan, Lu Li, Jiangmeng Li

Inspired by the successful application of contrastive learning on graphs, researchers attempt to impose graph contrastive learning approaches on heterogeneous information networks. Orthogonal to homogeneous graphs, the types of nodes and edges in heterogeneous graphs are diverse so that specialized graph contrastive learning methods are required. Most existing methods for heterogeneous graph contrastive learning are implemented by transforming heterogeneous graphs into homogeneous graphs, which may lead to ramifications that the valuable information carried by non-target nodes is undermined thereby exacerbating the performance of contrastive learning models. Additionally, current heterogeneous graph contrastive learning methods are mainly based on initial meta-paths given by the dataset, yet according to our deep-going exploration, we derive empirical conclusions: only initial meta-paths cannot contain sufficiently discriminative information; and various types of meta-paths can effectively promote the performance of heterogeneous graph contrastive learning methods. To this end, we propose a new multi-scale meta-path integrated heterogeneous graph contrastive learning (M2HGCL) model, which discards the conventional heterogeneity-homogeneity transformation and performs the graph contrastive learning in a joint manner. Specifically, we expand the meta-paths and jointly aggregate the direct neighbor information, the initial meta-path neighbor information and the expanded meta-path neighbor information to sufficiently capture discriminative information. A specific positive sampling strategy is further imposed to remedy the intrinsic deficiency of contrastive learning, i.e., the hard negative sample sampling issue. Through extensive experiments on three real-world datasets, we demonstrate that M2HGCL outperforms the current state-of-the-art baseline models.

* Accepted to the conference of ADMA2023 as an Oral presentation 
Viaarxiv icon

AI For Global Climate Cooperation 2023 Competition Proceedings

Jul 10, 2023
Yoshua Bengio, Prateek Gupta, Lu Li, Soham Phade, Sunil Srinivasa, Andrew Williams, Tianyu Zhang, Yang Zhang, Stephan Zheng

The international community must collaborate to mitigate climate change and sustain economic growth. However, collaboration is hard to achieve, partly because no global authority can ensure compliance with international climate agreements. Combining AI with climate-economic simulations offers a promising solution to design international frameworks, including negotiation protocols and climate agreements, that promote and incentivize collaboration. In addition, these frameworks should also have policy goals fulfillment, and sustained commitment, taking into account climate-economic dynamics and strategic behaviors. These challenges require an interdisciplinary approach across machine learning, economics, climate science, law, policy, ethics, and other fields. Towards this objective, we organized AI for Global Climate Cooperation, a Mila competition in which teams submitted proposals and analyses of international frameworks, based on (modifications of) RICE-N, an AI-driven integrated assessment model (IAM). In particular, RICE-N supports modeling regional decision-making using AI agents. Furthermore, the IAM then models the climate-economic impact of those decisions into the future. Whereas the first track focused only on performance metrics, the proposals submitted to the second track were evaluated both quantitatively and qualitatively. The quantitative evaluation focused on a combination of (i) the degree of mitigation of global temperature rise and (ii) the increase in economic productivity. On the other hand, an interdisciplinary panel of human experts in law, policy, sociology, economics and environmental science, evaluated the solutions qualitatively. In particular, the panel considered the effectiveness, simplicity, feasibility, ethics, and notions of climate justice of the protocols. In the third track, the participants were asked to critique and improve RICE-N.

Viaarxiv icon

Normalization Enhances Generalization in Visual Reinforcement Learning

Jun 01, 2023
Lu Li, Jiafei Lyu, Guozheng Ma, Zilin Wang, Zhenjie Yang, Xiu Li, Zhiheng Li

Figure 1 for Normalization Enhances Generalization in Visual Reinforcement Learning
Figure 2 for Normalization Enhances Generalization in Visual Reinforcement Learning
Figure 3 for Normalization Enhances Generalization in Visual Reinforcement Learning
Figure 4 for Normalization Enhances Generalization in Visual Reinforcement Learning

Recent advances in visual reinforcement learning (RL) have led to impressive success in handling complex tasks. However, these methods have demonstrated limited generalization capability to visual disturbances, which poses a significant challenge for their real-world application and adaptability. Though normalization techniques have demonstrated huge success in supervised and unsupervised learning, their applications in visual RL are still scarce. In this paper, we explore the potential benefits of integrating normalization into visual RL methods with respect to generalization performance. We find that, perhaps surprisingly, incorporating suitable normalization techniques is sufficient to enhance the generalization capabilities, without any additional special design. We utilize the combination of two normalization techniques, CrossNorm and SelfNorm, for generalizable visual RL. Extensive experiments are conducted on DMControl Generalization Benchmark and CARLA to validate the effectiveness of our method. We show that our method significantly improves generalization capability while only marginally affecting sample efficiency. In particular, when integrated with DrQ-v2, our method enhances the test performance of DrQ-v2 on CARLA across various scenarios, from 14% of the training performance to 97%.

Viaarxiv icon

Learning Better with Less: Effective Augmentation for Sample-Efficient Visual Reinforcement Learning

May 25, 2023
Guozheng Ma, Linrui Zhang, Haoyu Wang, Lu Li, Zilin Wang, Zhen Wang, Li Shen, Xueqian Wang, Dacheng Tao

Figure 1 for Learning Better with Less: Effective Augmentation for Sample-Efficient Visual Reinforcement Learning
Figure 2 for Learning Better with Less: Effective Augmentation for Sample-Efficient Visual Reinforcement Learning
Figure 3 for Learning Better with Less: Effective Augmentation for Sample-Efficient Visual Reinforcement Learning
Figure 4 for Learning Better with Less: Effective Augmentation for Sample-Efficient Visual Reinforcement Learning

Data augmentation (DA) is a crucial technique for enhancing the sample efficiency of visual reinforcement learning (RL) algorithms. Notably, employing simple observation transformations alone can yield outstanding performance without extra auxiliary representation tasks or pre-trained encoders. However, it remains unclear which attributes of DA account for its effectiveness in achieving sample-efficient visual RL. To investigate this issue and further explore the potential of DA, this work conducts comprehensive experiments to assess the impact of DA's attributes on its efficacy and provides the following insights and improvements: (1) For individual DA operations, we reveal that both ample spatial diversity and slight hardness are indispensable. Building on this finding, we introduce Random PadResize (Rand PR), a new DA operation that offers abundant spatial diversity with minimal hardness. (2) For multi-type DA fusion schemes, the increased DA hardness and unstable data distribution result in the current fusion schemes being unable to achieve higher sample efficiency than their corresponding individual operations. Taking the non-stationary nature of RL into account, we propose a RL-tailored multi-type DA fusion scheme called Cycling Augmentation (CycAug), which performs periodic cycles of different DA operations to increase type diversity while maintaining data distribution consistency. Extensive evaluations on the DeepMind Control suite and CARLA driving simulator demonstrate that our methods achieve superior sample efficiency compared with the prior state-of-the-art methods.

* 27 pages, 21 figures 
Viaarxiv icon

SLUGBOT, an Aplysia-inspired Robotic Grasper for Studying Control

Nov 21, 2022
Kevin Dai, Ravesh Sukhnandan, Michael Bennington, Karen Whirley, Ryan Bao, Lu Li, Jeffrey P. Gill, Hillel J. Chiel, Victoria A. Webster-Wood

Figure 1 for SLUGBOT, an Aplysia-inspired Robotic Grasper for Studying Control
Figure 2 for SLUGBOT, an Aplysia-inspired Robotic Grasper for Studying Control
Figure 3 for SLUGBOT, an Aplysia-inspired Robotic Grasper for Studying Control
Figure 4 for SLUGBOT, an Aplysia-inspired Robotic Grasper for Studying Control

Living systems can use a single periphery to perform a variety of tasks and adapt to a dynamic environment. This multifunctionality is achieved through the use of neural circuitry that adaptively controls the reconfigurable musculature. Current robotic systems struggle to flexibly adapt to unstructured environments. Through mimicry of the neuromechanical coupling seen in living organisms, robotic systems could potentially achieve greater autonomy. The tractable neuromechanics of the sea slug $\textit{Aplysia californica's}$ feeding apparatus, or buccal mass, make it an ideal candidate for applying neuromechanical principles to the control of a soft robot. In this work, a robotic grasper was designed to mimic specific morphology of the $\textit{Aplysia}$ feeding apparatus. These include the use of soft actuators akin to biological muscle, a deformable grasping surface, and a similar muscular architecture. A previously developed Boolean neural controller was then adapted for the control of this soft robotic system. The robot was capable of qualitatively replicating swallowing behavior by cyclically ingesting a plastic tube. The robot's normalized translational and rotational kinematics of the odontophore followed profiles observed $\textit{in vivo}$ despite morphological differences. This brings $\textit{Aplysia}$-inspired control $\textit{in roboto}$ one step closer to multifunctional neural control schema $\textit{in vivo}$ and $\textit{in silico}$. Future additions may improve SLUGBOT's viability as a neuromechanical research platform.

* Submitted and accepted to Living Machines 2022 conference 
Viaarxiv icon

Mathematical Justification of Hard Negative Mining via Isometric Approximation Theorem

Oct 20, 2022
Albert Xu, Jhih-Yi Hsieh, Bhaskar Vundurthy, Eliana Cohen, Howie Choset, Lu Li

Figure 1 for Mathematical Justification of Hard Negative Mining via Isometric Approximation Theorem
Figure 2 for Mathematical Justification of Hard Negative Mining via Isometric Approximation Theorem
Figure 3 for Mathematical Justification of Hard Negative Mining via Isometric Approximation Theorem
Figure 4 for Mathematical Justification of Hard Negative Mining via Isometric Approximation Theorem

In deep metric learning, the Triplet Loss has emerged as a popular method to learn many computer vision and natural language processing tasks such as facial recognition, object detection, and visual-semantic embeddings. One issue that plagues the Triplet Loss is network collapse, an undesirable phenomenon where the network projects the embeddings of all data onto a single point. Researchers predominately solve this problem by using triplet mining strategies. While hard negative mining is the most effective of these strategies, existing formulations lack strong theoretical justification for their empirical success. In this paper, we utilize the mathematical theory of isometric approximation to show an equivalence between the Triplet Loss sampled by hard negative mining and an optimization problem that minimizes a Hausdorff-like distance between the neural network and its ideal counterpart function. This provides the theoretical justifications for hard negative mining's empirical efficacy. In addition, our novel application of the isometric approximation theorem provides the groundwork for future forms of hard negative mining that avoid network collapse. Our theory can also be extended to analyze other Euclidean space-based metric learning methods like Ladder Loss or Contrastive Learning.

* 9 pages, 6 figures, submitted to AAAI 2023 
Viaarxiv icon

READ: Large-Scale Neural Scene Rendering for Autonomous Driving

May 11, 2022
Zhuopeng Li, Lu Li, Zeyu Ma, Ping Zhang, Junbo Chen, Jianke Zhu

Figure 1 for READ: Large-Scale Neural Scene Rendering for Autonomous Driving
Figure 2 for READ: Large-Scale Neural Scene Rendering for Autonomous Driving
Figure 3 for READ: Large-Scale Neural Scene Rendering for Autonomous Driving
Figure 4 for READ: Large-Scale Neural Scene Rendering for Autonomous Driving

Synthesizing free-view photo-realistic images is an important task in multimedia. With the development of advanced driver assistance systems~(ADAS) and their applications in autonomous vehicles, experimenting with different scenarios becomes a challenge. Although the photo-realistic street scenes can be synthesized by image-to-image translation methods, which cannot produce coherent scenes due to the lack of 3D information. In this paper, a large-scale neural rendering method is proposed to synthesize the autonomous driving scene~(READ), which makes it possible to synthesize large-scale driving scenarios on a PC through a variety of sampling schemes. In order to represent driving scenarios, we propose an {\omega} rendering network to learn neural descriptors from sparse point clouds. Our model can not only synthesize realistic driving scenes but also stitch and edit driving scenes. Experiments show that our model performs well in large-scale driving scenarios.

Viaarxiv icon

Design of a Biomimetic Tactile Sensor for Material Classification

Mar 29, 2022
Kevin Dai, Xinyu Wang, Allison M. Rojas, Evan Harber, Yu Tian, Nicholas Paiva, Joseph Gnehm, Evan Schindewolf, Howie Choset, Victoria A. Webster-Wood, Lu Li

Figure 1 for Design of a Biomimetic Tactile Sensor for Material Classification
Figure 2 for Design of a Biomimetic Tactile Sensor for Material Classification
Figure 3 for Design of a Biomimetic Tactile Sensor for Material Classification
Figure 4 for Design of a Biomimetic Tactile Sensor for Material Classification

Tactile sensing typically involves active exploration of unknown surfaces and objects, making it especially effective at processing the characteristics of materials and textures. A key property extracted by human tactile perception is surface roughness, which relies on measuring vibratory signals using the multi-layered fingertip structure. Existing robotic systems lack tactile sensors that are able to provide high dynamic sensing ranges, perceive material properties, and maintain a low hardware cost. In this work, we introduce the reference design and fabrication procedure of a miniature and low-cost tactile sensor consisting of a biomimetic cutaneous structure, including the artificial fingerprint, dermis, epidermis, and an embedded magnet-sensor structure which serves as a mechanoreceptor for converting mechanical information to digital signals. The presented sensor is capable of detecting high-resolution magnetic field data through the Hall effect and creating high-dimensional time-frequency domain features for material texture classification. Additionally, we investigate the effects of different superficial sensor fingerprint patterns for classifying materials through both simulation and physical experimentation. After extracting time series and frequency domain features, we assess a k-nearest neighbors classifier for distinguishing between different materials. The results from our experiments show that our biomimetic tactile sensors with fingerprint ridges can classify materials with more than 8% higher accuracy and lower variability than ridge-less sensors. These results, along with the low cost and customizability of our sensor, demonstrate high potential for lowering the barrier to entry for a wide array of robotic applications, including model-less tactile sensing for texture classification, material inspection, and object recognition.

* To be published in ICRA 2022 
Viaarxiv icon

Towards a Multispectral RGB-IR-UV-D Vision System -- Seeing the Invisible in 3D

Aug 19, 2021
Tanhao Zhang, Luyin Hu, Lu Li, David Navarro-Alarcon

Figure 1 for Towards a Multispectral RGB-IR-UV-D Vision System -- Seeing the Invisible in 3D
Figure 2 for Towards a Multispectral RGB-IR-UV-D Vision System -- Seeing the Invisible in 3D
Figure 3 for Towards a Multispectral RGB-IR-UV-D Vision System -- Seeing the Invisible in 3D
Figure 4 for Towards a Multispectral RGB-IR-UV-D Vision System -- Seeing the Invisible in 3D

In this paper, we present the development of a sensing system with the capability to compute multispectral point clouds in real-time. The proposed multi-eye sensor system effectively registers information from the visible, (long-wave) infrared, and ultraviolet spectrum to its depth sensing frame, thus enabling to measure a wider range of surface features that are otherwise hidden to the naked eye. For that, we designed a new cross-calibration apparatus that produces consistent features which can be sensed by each of the cameras, therefore, acting as a multispectral "chessboard". The performance of the sensor is evaluated with two different cases of studies, where we show that the proposed system can detect "hidden" features of a 3D environment.

Viaarxiv icon

Hybrid Reasoning Network for Video-based Commonsense Captioning

Aug 05, 2021
Weijiang Yu, Jian Liang, Lei Ji, Lu Li, Yuejian Fang, Nong Xiao, Nan Duan

The task of video-based commonsense captioning aims to generate event-wise captions and meanwhile provide multiple commonsense descriptions (e.g., attribute, effect and intention) about the underlying event in the video. Prior works explore the commonsense captions by using separate networks for different commonsense types, which is time-consuming and lacks mining the interaction of different commonsense. In this paper, we propose a Hybrid Reasoning Network (HybridNet) to endow the neural networks with the capability of semantic-level reasoning and word-level reasoning. Firstly, we develop multi-commonsense learning for semantic-level reasoning by jointly training different commonsense types in a unified network, which encourages the interaction between the clues of multiple commonsense descriptions, event-wise captions and videos. Then, there are two steps to achieve the word-level reasoning: (1) a memory module records the history predicted sequence from the previous generation processes; (2) a memory-routed multi-head attention (MMHA) module updates the word-level attention maps by incorporating the history information from the memory module into the transformer decoder for word-level reasoning. Moreover, the multimodal features are used to make full use of diverse knowledge for commonsense reasoning. Experiments and abundant analysis on the large-scale Video-to-Commonsense benchmark show that our HybridNet achieves state-of-the-art performance compared with other methods.

* 11 pages, 6 figures 
Viaarxiv icon