Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Han Wu

LDRNet: Enabling Real-time Document Localization on Mobile Devices

Jun 05, 2022
Han Wu, Holland Qian, Huaming Wu

Figure 1 for LDRNet: Enabling Real-time Document Localization on Mobile Devices

Figure 2 for LDRNet: Enabling Real-time Document Localization on Mobile Devices

Figure 3 for LDRNet: Enabling Real-time Document Localization on Mobile Devices

Figure 4 for LDRNet: Enabling Real-time Document Localization on Mobile Devices

While Identity Document Verification (IDV) technology on mobile devices becomes ubiquitous in modern business operations, the risk of identity theft and fraud is increasing. The identity document holder is normally required to participate in an online video interview to circumvent impostors. However, the current IDV process depends on an additional human workforce to support online step-by-step guidance which is inefficient and expensive. The performance of existing AI-based approaches cannot meet the real-time and lightweight demands of mobile devices. In this paper, we address those challenges by designing an edge intelligence-assisted approach for real-time IDV. Aiming at improving the responsiveness of the IDV process, we propose a new document localization model for mobile devices, LDRNet, to Localize the identity Document in Real-time. On the basis of a lightweight backbone network, we build three prediction branches for LDRNet, the corner points prediction, the line borders prediction and the document classification. We design novel supplementary targets, the equal-division points, and use a new loss function named Line Loss, to improve the speed and accuracy of our approach. In addition to the IDV process, LDRNet is an efficient and reliable document localization alternative for all kinds of mobile applications. As a matter of proof, we compare the performance of LDRNet with other popular approaches on localizing general document datasets. The experimental results show that LDRNet runs at a speed up to 790 FPS which is 47x faster, while still achieving comparable Jaccard Index(JI) in single-model and single-scale tests.

Via

Access Paper or Ask Questions

Learning Locality and Isotropy in Dialogue Modeling

May 29, 2022
Han Wu, Haochen Tan, Mingjie Zhan, Gangming Zhao, Shaoqing Lu, Ding Liang, Linqi Song

Figure 1 for Learning Locality and Isotropy in Dialogue Modeling

Figure 2 for Learning Locality and Isotropy in Dialogue Modeling

Figure 3 for Learning Locality and Isotropy in Dialogue Modeling

Figure 4 for Learning Locality and Isotropy in Dialogue Modeling

Existing dialogue modeling methods have achieved promising performance on various dialogue tasks with the aid of Transformer and the large-scale pre-trained language models. However, some recent studies revealed that the context representations produced by these methods suffer the problem of anisotropy. In this paper, we find that the generated representations are also not conversational, losing the conversation structure information during the context modeling stage. To this end, we identify two properties in dialogue modeling, i.e., locality and isotropy, and present a simple method for dialogue representation calibration, namely SimDRC, to build isotropic and conversational feature spaces. Experimental results show that our approach significantly outperforms the current state-of-the-art models on three dialogue tasks across the automatic and human evaluation metrics. More in-depth analyses further confirm the effectiveness of our proposed approach.

* 18 pages, 4 figures

Via

Access Paper or Ask Questions

Learning Localization-aware Target Confidence for Siamese Visual Tracking

Apr 29, 2022
Jiahao Nie, Han Wu, Zhiwei He, Yuxiang Yang, Mingyu Gao, Zhekang Dong

Figure 1 for Learning Localization-aware Target Confidence for Siamese Visual Tracking

Figure 2 for Learning Localization-aware Target Confidence for Siamese Visual Tracking

Figure 3 for Learning Localization-aware Target Confidence for Siamese Visual Tracking

Figure 4 for Learning Localization-aware Target Confidence for Siamese Visual Tracking

Siamese tracking paradigm has achieved great success, providing effective appearance discrimination and size estimation by the classification and regression. While such a paradigm typically optimizes the classification and regression independently, leading to task misalignment (accurate prediction boxes have no high target confidence scores). In this paper, to alleviate this misalignment, we propose a novel tracking paradigm, called SiamLA. Within this paradigm, a series of simple, yet effective localization-aware components are introduced, to generate localization-aware target confidence scores. Specifically, with the proposed localization-aware dynamic label (LADL) loss and localization-aware label smoothing (LALS) strategy, collaborative optimization between the classification and regression is achieved, enabling classification scores to be aware of location state, not just appearance similarity. Besides, we propose a separate localization branch, centered on a localization-aware feature aggregation (LAFA) module, to produce location quality scores to further modify the classification scores. Consequently, the resulting target confidence scores, are more discriminative for the location state, allowing accurate prediction boxes tend to be predicted as high scores. Extensive experiments are conducted on six challenging benchmarks, including GOT-10k, TrackingNet, LaSOT, TNL2K, OTB100 and VOT2018. Our SiamLA achieves state-of-the-art performance in terms of both accuracy and efficiency. Furthermore, a stability analysis reveals that our tracking paradigm is relatively stable, implying the paradigm is potential to real-world applications.

Via

Access Paper or Ask Questions

Zero-shot Cross-lingual Conversational Semantic Role Labeling

Apr 11, 2022
Han Wu, Haochen Tan, Kun Xu, Shuqi Liu, Lianwei Wu, Linqi Song

Figure 1 for Zero-shot Cross-lingual Conversational Semantic Role Labeling

Figure 2 for Zero-shot Cross-lingual Conversational Semantic Role Labeling

Figure 3 for Zero-shot Cross-lingual Conversational Semantic Role Labeling

Figure 4 for Zero-shot Cross-lingual Conversational Semantic Role Labeling

While conversational semantic role labeling (CSRL) has shown its usefulness on Chinese conversational tasks, it is still under-explored in non-Chinese languages due to the lack of multilingual CSRL annotations for the parser training. To avoid expensive data collection and error-propagation of translation-based methods, we present a simple but effective approach to perform zero-shot cross-lingual CSRL. Our model implicitly learns language-agnostic, conversational structure-aware and semantically rich representations with the hierarchical encoders and elaborately designed pre-training objectives. Experimental results show that our model outperforms all baselines by large margins on two newly collected English CSRL test sets. More importantly, we confirm the usefulness of CSRL to non-Chinese conversational tasks such as the question-in-context rewriting task in English and the multi-turn dialogue response generation tasks in English, German and Japanese by incorporating the CSRL information into the downstream conversation-based models. We believe this finding is significant and will facilitate the research of non-Chinese dialogue tasks which suffer the problems of ellipsis and anaphora.

* NAACL 2022 findings

Via

Access Paper or Ask Questions

A Sentence is Worth 128 Pseudo Tokens: A Semantic-Aware Contrastive Learning Framework for Sentence Embeddings

Mar 11, 2022
Haochen Tan, Wei Shao, Han Wu, Ke Yang, Linqi Song

Figure 1 for A Sentence is Worth 128 Pseudo Tokens: A Semantic-Aware Contrastive Learning Framework for Sentence Embeddings

Figure 2 for A Sentence is Worth 128 Pseudo Tokens: A Semantic-Aware Contrastive Learning Framework for Sentence Embeddings

Figure 3 for A Sentence is Worth 128 Pseudo Tokens: A Semantic-Aware Contrastive Learning Framework for Sentence Embeddings

Figure 4 for A Sentence is Worth 128 Pseudo Tokens: A Semantic-Aware Contrastive Learning Framework for Sentence Embeddings

Contrastive learning has shown great potential in unsupervised sentence embedding tasks, e.g., SimCSE. However, We find that these existing solutions are heavily affected by superficial features like the length of sentences or syntactic structures. In this paper, we propose a semantics-aware contrastive learning framework for sentence embeddings, termed Pseudo-Token BERT (PT-BERT), which is able to exploit the pseudo-token space (i.e., latent semantic space) representation of a sentence while eliminating the impact of superficial features such as sentence length and syntax. Specifically, we introduce an additional pseudo token embedding layer independent of the BERT encoder to map each sentence into a sequence of pseudo tokens in a fixed length. Leveraging these pseudo sequences, we are able to construct same-length positive and negative pairs based on the attention mechanism to perform contrastive learning. In addition, we utilize both the gradient-updating and momentum-updating encoders to encode instances while dynamically maintaining an additional queue to store the representation of sentence embeddings, enhancing the encoder's learning performance for negative examples. Experiments show that our model outperforms the state-of-the-art baselines on six standard semantic textual similarity (STS) tasks. Furthermore, experiments on alignments and uniformity losses, as well as hard examples with different sentence lengths and syntax, consistently verify the effectiveness of our method.

* Long paper; ACL 2022 (Findings)

Via

Access Paper or Ask Questions

Partial Likelihood Thompson Sampling

Mar 02, 2022
Han Wu, Stefan Wager

Figure 1 for Partial Likelihood Thompson Sampling

Figure 2 for Partial Likelihood Thompson Sampling

Figure 3 for Partial Likelihood Thompson Sampling

Figure 4 for Partial Likelihood Thompson Sampling

We consider the problem of deciding how best to target and prioritize existing vaccines that may offer protection against new variants of an infectious disease. Sequential experiments are a promising approach; however, challenges due to delayed feedback and the overall ebb and flow of disease prevalence make available method inapplicable for this task. We present a method, partial likelihood Thompson sampling, that can handle these challenges. Our method involves running Thompson sampling with belief updates determined by partial likelihood each time we observe an event. To test our approach, we ran a semi-synthetic experiment based on 200 days of COVID-19 infection data in the US.

Via

Access Paper or Ask Questions

Ensemble Method for Estimating Individualized Treatment Effects

Feb 28, 2022
Kevin Wu Han, Han Wu

Figure 1 for Ensemble Method for Estimating Individualized Treatment Effects

Figure 2 for Ensemble Method for Estimating Individualized Treatment Effects

Figure 3 for Ensemble Method for Estimating Individualized Treatment Effects

Figure 4 for Ensemble Method for Estimating Individualized Treatment Effects

In many medical and business applications, researchers are interested in estimating individualized treatment effects using data from a randomized experiment. For example in medical applications, doctors learn the treatment effects from clinical trials and in technology companies, researchers learn them from A/B testing experiments. Although dozens of machine learning models have been proposed for this task, it is challenging to determine which model will be best for the problem at hand because ground-truth treatment effects are unobservable. In contrast to several recent papers proposing methods to select one of these competing models, we propose an algorithm for aggregating the estimates from a diverse library of models. We compare ensembling to model selection on 43 benchmark datasets, and find that ensembling wins almost every time. Theoretically, we prove that our ensemble model is (asymptotically) at least as accurate as the best model under consideration, even if the number of candidate models is allowed to grow with the sample size.

Via

Access Paper or Ask Questions

Thompson Sampling with Unrestricted Delays

Feb 24, 2022
Han Wu, Stefan Wager

Figure 1 for Thompson Sampling with Unrestricted Delays

Figure 2 for Thompson Sampling with Unrestricted Delays

Figure 3 for Thompson Sampling with Unrestricted Delays

Figure 4 for Thompson Sampling with Unrestricted Delays

We investigate properties of Thompson Sampling in the stochastic multi-armed bandit problem with delayed feedback. In a setting with i.i.d delays, we establish to our knowledge the first regret bounds for Thompson Sampling with arbitrary delay distributions, including ones with unbounded expectation. Our bounds are qualitatively comparable to the best available bounds derived via ad-hoc algorithms, and only depend on delays via selected quantiles of the delay distributions. Furthermore, in extensive simulation experiments, we find that Thompson Sampling outperforms a number of alternative proposals, including methods specifically designed for settings with delayed feedback.

Via

Access Paper or Ask Questions

Distilling Heterogeneity: From Explanations of Heterogeneous Treatment Effect Models to Interpretable Policies

Nov 05, 2021
Han Wu, Sarah Tan, Weiwei Li, Mia Garrard, Adam Obeng, Drew Dimmery, Shaun Singh, Hanson Wang, Daniel Jiang, Eytan Bakshy

Figure 1 for Distilling Heterogeneity: From Explanations of Heterogeneous Treatment Effect Models to Interpretable Policies

Figure 2 for Distilling Heterogeneity: From Explanations of Heterogeneous Treatment Effect Models to Interpretable Policies

Figure 3 for Distilling Heterogeneity: From Explanations of Heterogeneous Treatment Effect Models to Interpretable Policies

Figure 4 for Distilling Heterogeneity: From Explanations of Heterogeneous Treatment Effect Models to Interpretable Policies

Internet companies are increasingly using machine learning models to create personalized policies which assign, for each individual, the best predicted treatment for that individual. They are frequently derived from black-box heterogeneous treatment effect (HTE) models that predict individual-level treatment effects. In this paper, we focus on (1) learning explanations for HTE models; (2) learning interpretable policies that prescribe treatment assignments. We also propose guidance trees, an approach to ensemble multiple interpretable policies without the loss of interpretability. These rule-based interpretable policies are easy to deploy and avoid the need to maintain a HTE model in a production environment.

* A short version was presented at MIT CODE 2021

Via

Access Paper or Ask Questions

CSAGN: Conversational Structure Aware Graph Network for Conversational Semantic Role Labeling

Sep 23, 2021
Han Wu, Kun Xu, Linqi Song

Figure 1 for CSAGN: Conversational Structure Aware Graph Network for Conversational Semantic Role Labeling

Figure 2 for CSAGN: Conversational Structure Aware Graph Network for Conversational Semantic Role Labeling

Figure 3 for CSAGN: Conversational Structure Aware Graph Network for Conversational Semantic Role Labeling

Figure 4 for CSAGN: Conversational Structure Aware Graph Network for Conversational Semantic Role Labeling

Conversational semantic role labeling (CSRL) is believed to be a crucial step towards dialogue understanding. However, it remains a major challenge for existing CSRL parser to handle conversational structural information. In this paper, we present a simple and effective architecture for CSRL which aims to address this problem. Our model is based on a conversational structure-aware graph network which explicitly encodes the speaker dependent information. We also propose a multi-task learning method to further improve the model. Experimental results on benchmark datasets show that our model with our proposed training objectives significantly outperforms previous baselines.

* To appear in EMNLP 2021

Via

Access Paper or Ask Questions