Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ye Yuan

Sam

Semantic Graph Representation Learning for Handwritten Mathematical Expression Recognition

Aug 21, 2023

Zhuang Liu, Ye Yuan, Zhilong Ji, Jingfeng Bai, Xiang Bai

Abstract:Handwritten mathematical expression recognition (HMER) has attracted extensive attention recently. However, current methods cannot explicitly study the interactions between different symbols, which may fail when faced similar symbols. To alleviate this issue, we propose a simple but efficient method to enhance semantic interaction learning (SIL). Specifically, we firstly construct a semantic graph based on the statistical symbol co-occurrence probabilities. Then we design a semantic aware module (SAM), which projects the visual and classification feature into semantic space. The cosine distance between different projected vectors indicates the correlation between symbols. And jointly optimizing HMER and SIL can explicitly enhances the model's understanding of symbol relationships. In addition, SAM can be easily plugged into existing attention-based models for HMER and consistently bring improvement. Extensive experiments on public benchmark datasets demonstrate that our proposed module can effectively enhance the recognition performance. Our method achieves better recognition performance than prior arts on both CROHME and HME100K datasets.

* 12 Pages

Via

Access Paper or Ask Questions

DREAM: Domain-free Reverse Engineering Attributes of Black-box Model

Jul 20, 2023

Rongqing Li, Jiaqi Yu, Changsheng Li, Wenhan Luo, Ye Yuan, Guoren Wang

Figure 1 for DREAM: Domain-free Reverse Engineering Attributes of Black-box Model

Figure 2 for DREAM: Domain-free Reverse Engineering Attributes of Black-box Model

Figure 3 for DREAM: Domain-free Reverse Engineering Attributes of Black-box Model

Figure 4 for DREAM: Domain-free Reverse Engineering Attributes of Black-box Model

Abstract:Deep learning models are usually black boxes when deployed on machine learning platforms. Prior works have shown that the attributes ($e.g.$, the number of convolutional layers) of a target black-box neural network can be exposed through a sequence of queries. There is a crucial limitation: these works assume the dataset used for training the target model to be known beforehand and leverage this dataset for model attribute attack. However, it is difficult to access the training dataset of the target black-box model in reality. Therefore, whether the attributes of a target black-box model could be still revealed in this case is doubtful. In this paper, we investigate a new problem of Domain-agnostic Reverse Engineering the Attributes of a black-box target Model, called DREAM, without requiring the availability of the target model's training dataset, and put forward a general and principled framework by casting this problem as an out of distribution (OOD) generalization problem. In this way, we can learn a domain-agnostic model to inversely infer the attributes of a target black-box model with unknown training data. This makes our method one of the kinds that can gracefully apply to an arbitrary domain for model attribute reverse engineering with strong generalization ability. Extensive experimental studies are conducted and the results validate the superiority of our proposed method over the baselines.

Via

Access Paper or Ask Questions

TREA: Tree-Structure Reasoning Schema for Conversational Recommendation

Jul 20, 2023

Wendi Li, Wei Wei, Xiaoye Qu, Xian-Ling Mao, Ye Yuan, Wenfeng Xie, Dangyang Chen

Figure 1 for TREA: Tree-Structure Reasoning Schema for Conversational Recommendation

Figure 2 for TREA: Tree-Structure Reasoning Schema for Conversational Recommendation

Figure 3 for TREA: Tree-Structure Reasoning Schema for Conversational Recommendation

Figure 4 for TREA: Tree-Structure Reasoning Schema for Conversational Recommendation

Abstract:Conversational recommender systems (CRS) aim to timely trace the dynamic interests of users through dialogues and generate relevant responses for item recommendations. Recently, various external knowledge bases (especially knowledge graphs) are incorporated into CRS to enhance the understanding of conversation contexts. However, recent reasoning-based models heavily rely on simplified structures such as linear structures or fixed-hierarchical structures for causality reasoning, hence they cannot fully figure out sophisticated relationships among utterances with external knowledge. To address this, we propose a novel Tree structure Reasoning schEmA named TREA. TREA constructs a multi-hierarchical scalable tree as the reasoning structure to clarify the causal relationships between mentioned entities, and fully utilizes historical conversations to generate more reasonable and suitable responses for recommended results. Extensive experiments on two public CRS datasets have demonstrated the effectiveness of our approach.

* Accepted by ACL2023 main conference

Via

Access Paper or Ask Questions

Shared Growth of Graph Neural Networks via Free-direction Knowledge Distillation

Jul 08, 2023

Kaituo Feng, Yikun Miao, Changsheng Li, Ye Yuan, Guoren Wang

Abstract:Knowledge distillation (KD) has shown to be effective to boost the performance of graph neural networks (GNNs), where the typical objective is to distill knowledge from a deeper teacher GNN into a shallower student GNN. However, it is often quite challenging to train a satisfactory deeper GNN due to the well-known over-parametrized and over-smoothing issues, leading to invalid knowledge transfer in practical applications. In this paper, we propose the first Free-direction Knowledge Distillation framework via reinforcement learning for GNNs, called FreeKD, which is no longer required to provide a deeper well-optimized teacher GNN. Our core idea is to collaboratively learn two shallower GNNs in an effort to exchange knowledge between them via reinforcement learning in a hierarchical way. As we observe that one typical GNN model often exhibits better and worse performances at different nodes during training, we devise a dynamic and free-direction knowledge transfer strategy that involves two levels of actions: 1) node-level action determines the directions of knowledge transfer between the corresponding nodes of two networks; and then 2) structure-level action determines which of the local structures generated by the node-level actions to be propagated. Furthermore, considering the diverse knowledge present in different GNNs when dealing with multi-view inputs, we introduce FreeKD++ as a solution to enable free-direction knowledge transfer among multiple shallow GNNs operating on multi-view inputs. Extensive experiments on five benchmark datasets demonstrate our approaches outperform the base GNNs in a large margin, and shows their efficacy to various GNNs. More surprisingly, our FreeKD has comparable or even better performance than traditional KD algorithms that distill knowledge from a deeper and stronger teacher GNN.

* There are some bugs with displaying Figure 8 in the arXiv version, which may be due to the type and version of the LaTeX compiler used

Via

Access Paper or Ask Questions

NeRFool: Uncovering the Vulnerability of Generalizable Neural Radiance Fields against Adversarial Perturbations

Jun 10, 2023

Yonggan Fu, Ye Yuan, Souvik Kundu, Shang Wu, Shunyao Zhang, Yingyan Lin

Figure 1 for NeRFool: Uncovering the Vulnerability of Generalizable Neural Radiance Fields against Adversarial Perturbations

Figure 2 for NeRFool: Uncovering the Vulnerability of Generalizable Neural Radiance Fields against Adversarial Perturbations

Figure 3 for NeRFool: Uncovering the Vulnerability of Generalizable Neural Radiance Fields against Adversarial Perturbations

Figure 4 for NeRFool: Uncovering the Vulnerability of Generalizable Neural Radiance Fields against Adversarial Perturbations

Abstract:Generalizable Neural Radiance Fields (GNeRF) are one of the most promising real-world solutions for novel view synthesis, thanks to their cross-scene generalization capability and thus the possibility of instant rendering on new scenes. While adversarial robustness is essential for real-world applications, little study has been devoted to understanding its implication on GNeRF. We hypothesize that because GNeRF is implemented by conditioning on the source views from new scenes, which are often acquired from the Internet or third-party providers, there are potential new security concerns regarding its real-world applications. Meanwhile, existing understanding and solutions for neural networks' adversarial robustness may not be applicable to GNeRF, due to its 3D nature and uniquely diverse operations. To this end, we present NeRFool, which to the best of our knowledge is the first work that sets out to understand the adversarial robustness of GNeRF. Specifically, NeRFool unveils the vulnerability patterns and important insights regarding GNeRF's adversarial robustness. Built upon the above insights gained from NeRFool, we further develop NeRFool+, which integrates two techniques capable of effectively attacking GNeRF across a wide range of target views, and provide guidelines for defending against our proposed attacks. We believe that our NeRFool/NeRFool+ lays the initial foundation for future innovations in developing robust real-world GNeRF solutions. Our codes are available at: https://github.com/GATECH-EIC/NeRFool.

* Accepted by ICML 2023

Via

Access Paper or Ask Questions

Masked and Permuted Implicit Context Learning for Scene Text Recognition

May 25, 2023

Xiaomeng Yang, Zhi Qiao, Jin Wei, Yu Zhou, Ye Yuan, Zhilong Ji, Dongbao Yang, Weiping Wang

Figure 1 for Masked and Permuted Implicit Context Learning for Scene Text Recognition

Figure 2 for Masked and Permuted Implicit Context Learning for Scene Text Recognition

Figure 3 for Masked and Permuted Implicit Context Learning for Scene Text Recognition

Figure 4 for Masked and Permuted Implicit Context Learning for Scene Text Recognition

Abstract:Scene Text Recognition (STR) is a challenging task due to variations in text style, shape, and background. Incorporating linguistic information is an effective way to enhance the robustness of STR models. Existing methods rely on permuted language modeling (PLM) or masked language modeling (MLM) to learn contextual information implicitly, either through an ensemble of permuted autoregressive (AR) LMs training or iterative non-autoregressive (NAR) decoding procedure. However, these methods exhibit limitations: PLM's AR decoding results in the lack of information about future characters, while MLM provides global information of the entire text but neglects dependencies among each predicted character. In this paper, we propose a Masked and Permuted Implicit Context Learning Network for STR, which unifies PLM and MLM within a single decoding architecture, inheriting the advantages of both approaches. We utilize the training procedure of PLM, and to integrate MLM, we incorporate word length information into the decoding process by introducing specific numbers of mask tokens. Experimental results demonstrate that our proposed model achieves state-of-the-art performance on standard benchmarks using both AR and NAR decoding procedures.

Via

Access Paper or Ask Questions

Robust Tickets Can Transfer Better: Drawing More Transferable Subnetworks in Transfer Learning

Apr 24, 2023

Yonggan Fu, Ye Yuan, Shang Wu, Jiayi Yuan, Yingyan Lin

Abstract:Transfer learning leverages feature representations of deep neural networks (DNNs) pretrained on source tasks with rich data to empower effective finetuning on downstream tasks. However, the pretrained models are often prohibitively large for delivering generalizable representations, which limits their deployment on edge devices with constrained resources. To close this gap, we propose a new transfer learning pipeline, which leverages our finding that robust tickets can transfer better, i.e., subnetworks drawn with properly induced adversarial robustness can win better transferability over vanilla lottery ticket subnetworks. Extensive experiments and ablation studies validate that our proposed transfer learning pipeline can achieve enhanced accuracy-sparsity trade-offs across both diverse downstream tasks and sparsity patterns, further enriching the lottery ticket hypothesis.

* Accepted by DAC 2023

Via

Access Paper or Ask Questions

Trace and Pace: Controllable Pedestrian Animation via Guided Trajectory Diffusion

Apr 04, 2023

Davis Rempe, Zhengyi Luo, Xue Bin Peng, Ye Yuan, Kris Kitani, Karsten Kreis, Sanja Fidler, Or Litany

Figure 1 for Trace and Pace: Controllable Pedestrian Animation via Guided Trajectory Diffusion

Figure 2 for Trace and Pace: Controllable Pedestrian Animation via Guided Trajectory Diffusion

Figure 3 for Trace and Pace: Controllable Pedestrian Animation via Guided Trajectory Diffusion

Figure 4 for Trace and Pace: Controllable Pedestrian Animation via Guided Trajectory Diffusion

Abstract:We introduce a method for generating realistic pedestrian trajectories and full-body animations that can be controlled to meet user-defined goals. We draw on recent advances in guided diffusion modeling to achieve test-time controllability of trajectories, which is normally only associated with rule-based systems. Our guided diffusion model allows users to constrain trajectories through target waypoints, speed, and specified social groups while accounting for the surrounding environment context. This trajectory diffusion model is integrated with a novel physics-based humanoid controller to form a closed-loop, full-body pedestrian animation system capable of placing large crowds in a simulated environment with varying terrains. We further propose utilizing the value function learned during RL training of the animation controller to guide diffusion to produce trajectories better suited for particular scenarios such as collision avoidance and traversing uneven terrain. Video results are available on the project page at https://nv-tlabs.github.io/trace-pace .

* Conference on Computer Vision and Pattern Recognition (CVPR) 2023

Via

Access Paper or Ask Questions

A Generalized Nyquist-Shannon Sampling Theorem Using the Koopman Operator

Mar 06, 2023

Zhexuan Zeng, Ye Yuan

Figure 1 for A Generalized Nyquist-Shannon Sampling Theorem Using the Koopman Operator

Figure 2 for A Generalized Nyquist-Shannon Sampling Theorem Using the Koopman Operator

Figure 3 for A Generalized Nyquist-Shannon Sampling Theorem Using the Koopman Operator

Figure 4 for A Generalized Nyquist-Shannon Sampling Theorem Using the Koopman Operator

Abstract:The sampling theorem plays a fundamental role for the recovery of continuous-time signals from discrete-time samples in the field of signal processing. The sampling theorem of non-band-limited signals has evolved into one of the most challenging problems. In this work, a generalized sampling theorem -- which builds on the Koopman operator -- is proved for signals in generator-bounded space (Theorem 1). It naturally extends the Nyquist-Shannon sampling theorem that, 1) for band-limited signals, the lower bounds of sampling frequency given by these two theorems are exactly the same; 2) the Koopman operator-based sampling theorem can also provide finite bound of sampling frequency for certain types of non-band-limited signals, which can not be addressed by Nyquist-Shannon sampling theorem. These types of non-band-limited signals include but not limited to, for example, inverse Laplace transform with limited imaginary interval of integration, and linear combinations of complex exponential functions. Moreover, the Koopman operator-based reconstruction algorithm is provided with theoretical result of convergence. By this algorithm, the sampling theorem is effectively illustrated on several signals related to sine, exponential and polynomial signals.

Via

Access Paper or Ask Questions

EquiPocket: an E(3)-Equivariant Geometric Graph Neural Network for Ligand Binding Site Prediction

Feb 23, 2023

Yang Zhang, Wenbing Huang, Zhewei Wei, Ye Yuan, Zhaohan Ding

Abstract:Predicting the binding sites of the target proteins plays a fundamental role in drug discovery. Most existing deep-learning methods consider a protein as a 3D image by spatially clustering its atoms into voxels and then feed the voxelized protein into a 3D CNN for prediction. However, the CNN-based methods encounter several critical issues: 1) defective in representing irregular protein structures; 2) sensitive to rotations; 3) insufficient to characterize the protein surface; 4) unaware of data distribution shift. To address the above issues, this work proposes EquiPocket, an E(3)-equivariant Graph Neural Network (GNN) for binding site prediction. In particular, EquiPocket consists of three modules: the first one to extract local geometric information for each surface atom, the second one to model both the chemical and spatial structure of the protein, and the last one to capture the geometry of the surface via equivariant message passing over the surface atoms. We further propose a dense attention output layer to better alleviate the data distribution shift effect incurred by the variable protein size. Extensive experiments on several representative benchmarks demonstrate the superiority of our framework to the state-of-the-art methods.

Via

Access Paper or Ask Questions