Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Topic": models, code, and papers

MIDV-2020: A Comprehensive Benchmark Dataset for Identity Document Analysis

Jul 01, 2021
Konstantin Bulatov, Ekaterina Emelianova, Daniil Tropin, Natalya Skoryukina, Yulia Chernyshova, Alexander Sheshkus, Sergey Usilin, Zuheng Ming, Jean-Christophe Burie, Muhammad Muzzamil Luqman, Vladimir V. Arlazarov

Identity documents recognition is an important sub-field of document analysis, which deals with tasks of robust document detection, type identification, text fields recognition, as well as identity fraud prevention and document authenticity validation given photos, scans, or video frames of an identity document capture. Significant amount of research has been published on this topic in recent years, however a chief difficulty for such research is scarcity of datasets, due to the subject matter being protected by security requirements. A few datasets of identity documents which are available lack diversity of document types, capturing conditions, or variability of document field values. In addition, the published datasets were typically designed only for a subset of document recognition problems, not for a complex identity document analysis. In this paper, we present a dataset MIDV-2020 which consists of 1000 video clips, 2000 scanned images, and 1000 photos of 1000 unique mock identity documents, each with unique text field values and unique artificially generated faces, with rich annotation. For the presented benchmark dataset baselines are provided for such tasks as document location and identification, text fields recognition, and face detection. With 72409 annotated images in total, to the date of publication the proposed dataset is the largest publicly available identity documents dataset with variable artificially generated data, and we believe that it will prove invaluable for advancement of the field of document analysis and recognition. The dataset is available for download at and .

  Access Paper or Ask Questions

The Contestation of Tech Ethics: A Sociotechnical Approach to Ethics and Technology in Action

Jun 03, 2021
Ben Green

Recent controversies related to topics such as fake news, privacy, and algorithmic bias have prompted increased public scrutiny of digital technologies and soul-searching among many of the people associated with their development. In response, the tech industry, academia, civil society, and governments have rapidly increased their attention to "ethics" in the design and use of digital technologies ("tech ethics"). Yet almost as quickly as ethics discourse has proliferated across the world of digital technologies, the limitations of these approaches have also become apparent: tech ethics is vague and toothless, is subsumed into corporate logics and incentives, and has a myopic focus on individual engineers and technology design rather than on the structures and cultures of technology production. As a result of these limitations, many have grown skeptical of tech ethics and its proponents, charging them with "ethics-washing": promoting ethics research and discourse to defuse criticism and government regulation without committing to ethical behavior. By looking at how ethics has been taken up in both science and business in superficial and depoliticizing ways, I recast tech ethics as a terrain of contestation where the central fault line is not whether it is desirable to be ethical, but what "ethics" entails and who gets to define it. This framing highlights the significant limits of current approaches to tech ethics and the importance of studying the formulation and real-world effects of tech ethics. In order to identify and develop more rigorous strategies for reforming digital technologies and the social relations that they mediate, I describe a sociotechnical approach to tech ethics, one that reflexively applies many of tech ethics' own lessons regarding digital technologies to tech ethics itself.

  Access Paper or Ask Questions

A Novel Neuron Model of Visual Processor

Apr 15, 2021
Jizhao Liu, Jing Lian, J C Sprott, Yide Ma

Simulating and imitating the neuronal network of humans or mammals is a popular topic that has been explored for many years in the fields of pattern recognition and computer vision. Inspired by neuronal conduction characteristics in the primary visual cortex of cats, pulse-coupled neural networks (PCNNs) can exhibit synchronous oscillation behavior, which can process digital images without training. However, according to the study of single cells in the cat primary visual cortex, when a neuron is stimulated by an external periodic signal, the interspike-interval (ISI) distributions represent a multimodal distribution. This phenomenon cannot be explained by all PCNN models. By analyzing the working mechanism of the PCNN, we present a novel neuron model of the primary visual cortex consisting of a continuous-coupled neural network (CCNN). Our model inherited the threshold exponential decay and synchronous pulse oscillation property of the original PCNN model, and it can exhibit chaotic behavior consistent with the testing results of cat primary visual cortex neurons. Therefore, our CCNN model is closer to real visual neural networks. For image segmentation tasks, the algorithm based on CCNN model has better performance than the state-of-art of visual cortex neural network model. The strength of our approach is that it helps neurophysiologists further understand how the primary visual cortex works and can be used to quantitatively predict the temporal-spatial behavior of real neural networks. CCNN may also inspire engineers to create brain-inspired deep learning networks for artificial intelligence purposes.

  Access Paper or Ask Questions

AutonoML: Towards an Integrated Framework for Autonomous Machine Learning

Dec 23, 2020
David Jacob Kedziora, Katarzyna Musial, Bogdan Gabrys

Over the last decade, the long-running endeavour to automate high-level processes in machine learning (ML) has risen to mainstream prominence, stimulated by advances in optimisation techniques and their impact on selecting ML models/algorithms. Central to this drive is the appeal of engineering a computational system that both discovers and deploys high-performance solutions to arbitrary ML problems with minimal human interaction. Beyond this, an even loftier goal is the pursuit of autonomy, which describes the capability of the system to independently adjust an ML solution over a lifetime of changing contexts. However, these ambitions are unlikely to be achieved in a robust manner without the broader synthesis of various mechanisms and theoretical frameworks, which, at the present time, remain scattered across numerous research threads. Accordingly, this review seeks to motivate a more expansive perspective on what constitutes an automated/autonomous ML system, alongside consideration of how best to consolidate those elements. In doing so, we survey developments in the following research areas: hyperparameter optimisation, multi-component models, neural architecture search, automated feature engineering, meta-learning, multi-level ensembling, dynamic adaptation, multi-objective evaluation, resource constraints, flexible user involvement, and the principles of generalisation. We also develop a conceptual framework throughout the review, augmented by each topic, to illustrate one possible way of fusing high-level mechanisms into an autonomous ML system. Ultimately, we conclude that the notion of architectural integration deserves more discussion, without which the field of automated ML risks stifling both its technical advantages and general uptake.

  Access Paper or Ask Questions

Single-Frame based Deep View Synchronization for Unsynchronized Multi-Camera Surveillance

Jul 08, 2020
Qi Zhang, Antoni B. Chan

Multi-camera surveillance has been an active research topic for understanding and modeling scenes. Compared to a single camera, multi-cameras provide larger field-of-view and more object cues, and the related applications are multi-view counting, multi-view tracking, 3D pose estimation or 3D reconstruction, etc. It is usually assumed that the cameras are all temporally synchronized when designing models for these multi-camera based tasks. However, this assumption is not always valid,especially for multi-camera systems with network transmission delay and low frame-rates due to limited network bandwidth, resulting in desynchronization of the captured frames across cameras. To handle the issue of unsynchronized multi-cameras, in this paper, we propose a synchronization model that works in conjunction with existing DNN-based multi-view models, thus avoiding the redesign of the whole model. Under the low-fps regime, we assume that only a single relevant frame is available from each view, and synchronization is achieved by matching together image contents guided by epipolar geometry. We consider two variants of the model, based on where in the pipeline the synchronization occurs, scene-level synchronization and camera-level synchronization. The view synchronization step and the task-specific view fusion and prediction step are unified in the same framework and trained in an end-to-end fashion. Our view synchronization models are applied to different DNNs-based multi-camera vision tasks under the unsynchronized setting, including multi-view counting and 3D pose estimation, and achieve good performance compared to baselines.

* 12 pages 

  Access Paper or Ask Questions

Augmenting Visual SLAM with Wi-Fi Sensing For Indoor Applications

Mar 15, 2019
Zakieh S. Hashemifar, Charuvahan Adhivarahan, Anand Balakrishnan, Karthik Dantu

Recent trends have accelerated the development of spatial applications on mobile devices and robots. These include navigation, augmented reality, human-robot interaction, and others. A key enabling technology for such applications is the understanding of the device's location and the map of the surrounding environment. This generic problem, referred to as Simultaneous Localization and Mapping (SLAM), is an extensively researched topic in robotics. However, visual SLAM algorithms face several challenges including perceptual aliasing and high computational cost. These challenges affect the accuracy, efficiency, and viability of visual SLAM algorithms, especially for long-term SLAM, and their use in resource-constrained mobile devices. A parallel trend is the ubiquity of Wi-Fi routers for quick Internet access in most urban environments. Most robots and mobile devices are equipped with a Wi-Fi radio as well. We propose a method to utilize Wi-Fi received signal strength to alleviate the challenges faced by visual SLAM algorithms. To demonstrate the utility of this idea, this work makes the following contributions: (i) We propose a generic way to integrate Wi-Fi sensing into visual SLAM algorithms, (ii) We integrate such sensing into three well-known SLAM algorithms, (iii) Using four distinct datasets, we demonstrate the performance of such augmentation in comparison to the original visual algorithms and (iv) We compare our work to Wi-Fi augmented FABMAP algorithm. Overall, we show that our approach can improve the accuracy of visual SLAM algorithms by 11% on average and reduce computation time on average by 15% to 25%.

* 16 pages, 19 figures, Autonomous Robots Journal submission (AuRo) 

  Access Paper or Ask Questions

End-to-end Structure-Aware Convolutional Networks for Knowledge Base Completion

Nov 15, 2018
Chao Shang, Yun Tang, Jing Huang, Jinbo Bi, Xiaodong He, Bowen Zhou

Knowledge graph embedding has been an active research topic for knowledge base completion, with progressive improvement from the initial TransE, TransH, DistMult et al to the current state-of-the-art ConvE. ConvE uses 2D convolution over embeddings and multiple layers of nonlinear features to model knowledge graphs. The model can be efficiently trained and scalable to large knowledge graphs. However, there is no structure enforcement in the embedding space of ConvE. The recent graph convolutional network (GCN) provides another way of learning graph node embedding by successfully utilizing graph connectivity structure. In this work, we propose a novel end-to-end Structure-Aware Convolutional Network (SACN) that takes the benefit of GCN and ConvE together. SACN consists of an encoder of a weighted graph convolutional network (WGCN), and a decoder of a convolutional network called Conv-TransE. WGCN utilizes knowledge graph node structure, node attributes and edge relation types. It has learnable weights that adapt the amount of information from neighbors used in local aggregation, leading to more accurate embeddings of graph nodes. Node attributes in the graph are represented as additional nodes in the WGCN. The decoder Conv-TransE enables the state-of-the-art ConvE to be translational between entities and relations while keeps the same link prediction performance as ConvE. We demonstrate the effectiveness of the proposed SACN on standard FB15k-237 and WN18RR datasets, and it gives about 10% relative improvement over the state-of-the-art ConvE in terms of [email protected], [email protected] and [email protected]

* The Thirty-Third AAAI Conference on Artificial Intelligence (AAAI 2019) 

  Access Paper or Ask Questions

Minimizing Polarization and Disagreement in Social Networks

Dec 28, 2017
Cameron Musco, Christopher Musco, Charalampos E. Tsourakakis

The rise of social media and online social networks has been a disruptive force in society. Opinions are increasingly shaped by interactions on online social media, and social phenomena including disagreement and polarization are now tightly woven into everyday life. In this work we initiate the study of the following question: given $n$ agents, each with its own initial opinion that reflects its core value on a topic, and an opinion dynamics model, what is the structure of a social network that minimizes {\em polarization} and {\em disagreement} simultaneously? This question is central to recommender systems: should a recommender system prefer a link suggestion between two online users with similar mindsets in order to keep disagreement low, or between two users with different opinions in order to expose each to the other's viewpoint of the world, and decrease overall levels of polarization? Our contributions include a mathematical formalization of this question as an optimization problem and an exact, time-efficient algorithm. We also prove that there always exists a network with $O(n/\epsilon^2)$ edges that is a $(1+\epsilon)$ approximation to the optimum. For a fixed graph, we additionally show how to optimize our objective function over the agents' innate opinions in polynomial time. We perform an empirical study of our proposed methods on synthetic and real-world data that verify their value as mining tools to better understand the trade-off between of disagreement and polarization. We find that there is a lot of space to reduce both polarization and disagreement in real-world networks; for instance, on a Reddit network where users exchange comments on politics, our methods achieve a $\sim 60\,000$-fold reduction in polarization and disagreement.

* 19 pages (accepted, WWW 2018) 

  Access Paper or Ask Questions

Training a Tokenizer for Free with Private Federated Learning

Mar 15, 2022
Eugene Bagdasaryan, Congzheng Song, Rogier van Dalen, Matt Seigel, Áine Cahill

Federated learning with differential privacy, i.e. private federated learning (PFL), makes it possible to train models on private data distributed across users' devices without harming privacy. PFL is efficient for models, such as neural networks, that have a fixed number of parameters, and thus a fixed-dimensional gradient vector. Such models include neural-net language models, but not tokenizers, the topic of this work. Training a tokenizer requires frequencies of words from an unlimited vocabulary, and existing methods for finding an unlimited vocabulary need a separate privacy budget. A workaround is to train the tokenizer on publicly available data. However, in this paper we first show that a tokenizer trained on mismatched data results in worse model performance compared to a privacy-violating "oracle" tokenizer that accesses user data, with perplexity increasing by 20%. We also show that sub-word tokenizers are better suited to the federated context than word-level ones, since they can encode new words, though with more tokens per word. Second, we propose a novel method to obtain a tokenizer without using any additional privacy budget. During private federated learning of the language model, we sample from the model, train a new tokenizer on the sampled sequences, and update the model embeddings. We then continue private federated learning, and obtain performance within 1% of the "oracle" tokenizer. Since this process trains the tokenizer only indirectly on private data, we can use the "postprocessing guarantee" of differential privacy and thus use no additional privacy budget.

  Access Paper or Ask Questions

Between welcome culture and border fence. A dataset on the European refugee crisis in German newspaper reports

Nov 19, 2021
Nico Blokker, André Blessing, Erenay Dayanik, Jonas Kuhn, Sebastian Padó, Gabriella Lapesa

Newspaper reports provide a rich source of information on the unfolding of public debate on specific policy fields that can serve as basis for inquiry in political science. Such debates are often triggered by critical events, which attract public attention and incite the reactions of political actors: crisis sparks the debate. However, due to the challenges of reliable annotation and modeling, few large-scale datasets with high-quality annotation are available. This paper introduces DebateNet2.0, which traces the political discourse on the European refugee crisis in the German quality newspaper taz during the year 2015. The core units of our annotation are political claims (requests for specific actions to be taken within the policy field) and the actors who make them (politicians, parties, etc.). The contribution of this paper is twofold. First, we document and release DebateNet2.0 along with its companion R package, mardyR, guiding the reader through the practical and conceptual issues related to the annotation of policy debates in newspapers. Second, we outline and apply a Discourse Network Analysis (DNA) to DebateNet2.0, comparing two crucial moments of the policy debate on the 'refugee crisis': the migration flux through the Mediterranean in April/May and the one along the Balkan route in September/October. Besides the released resources and the case-study, our contribution is also methodological: we talk the reader through the steps from a newspaper article to a discourse network, demonstrating that there is not just one discourse network for the German migration debate, but multiple ones, depending on the topic of interest (political actors, policy fields, time spans).

* Submitted to Language Resources and Evaluation. This manuscript is an extended version of 

  Access Paper or Ask Questions