Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Information": models, code, and papers

Information Extraction - A User Guide

Feb 11, 1997
Hamish Cunningham

This technical memo describes Information Extraction from the point-of-view of a potential user of the technology. No knowledge of language processing is assumed. Information Extraction is a process which takes unseen texts as input and produces fixed-format, unambiguous data as output. This data may be used directly for display to users, or may be stored in a database or spreadsheet for later analysis, or may be used for indexing purposes in Information Retrieval applications. See also

* LaTeX2e with PostScript figures, 17 pages (figures replaced with smaller versions) 

  Access Paper or Ask Questions

SIFT: An Algorithm for Extracting Structural Information From Taxonomies

Feb 23, 2016
Jorge Martinez-Gil

In this work we present SIFT, a 3-step algorithm for the analysis of the structural information represented by means of a taxonomy. The major advantage of this algorithm is the capability to leverage the information inherent to the hierarchical structures of taxonomies to infer correspondences which can allow to merge them in a later step. This method is particular relevant in scenarios where taxonomy alignment techniques exploiting textual information from taxonomy nodes cannot operate successfully.

* 12 pages 

  Access Paper or Ask Questions

The Dual Information Bottleneck

Jun 08, 2020
Zoe Piran, Ravid Shwartz-Ziv, Naftali Tishby

The Information Bottleneck (IB) framework is a general characterization of optimal representations obtained using a principled approach for balancing accuracy and complexity. Here we present a new framework, the Dual Information Bottleneck (dualIB), which resolves some of the known drawbacks of the IB. We provide a theoretical analysis of the dualIB framework; (i) solving for the structure of its solutions (ii) unraveling its superiority in optimizing the mean prediction error exponent and (iii) demonstrating its ability to preserve exponential forms of the original distribution. To approach large scale problems, we present a novel variational formulation of the dualIB for Deep Neural Networks. In experiments on several data-sets, we compare it to a variational form of the IB. This exposes superior Information Plane properties of the dualIB and its potential in improvement of the error.

  Access Paper or Ask Questions

Decomposing Textual Information For Style Transfer

Sep 26, 2019
Ivan P. Yamshchikov, Viacheslav Shibaev, Aleksander Nagaev, Jürgen Jost, Alexey Tikhonov

This paper focuses on latent representations that could effectively decompose different aspects of textual information. Using a framework of style transfer for texts, we propose several empirical methods to assess information decomposition quality. We validate these methods with several state-of-the-art textual style transfer methods. Higher quality of information decomposition corresponds to higher performance in terms of bilingual evaluation understudy (BLEU) between output and human-written reformulations.

* arXiv admin note: substantial text overlap with arXiv:1908.06809 

  Access Paper or Ask Questions

A Closer Look at the Adversarial Robustness of Information Bottleneck Models

Jul 12, 2021
Iryna Korshunova, David Stutz, Alexander A. Alemi, Olivia Wiles, Sven Gowal

We study the adversarial robustness of information bottleneck models for classification. Previous works showed that the robustness of models trained with information bottlenecks can improve upon adversarial training. Our evaluation under a diverse range of white-box $l_{\infty}$ attacks suggests that information bottlenecks alone are not a strong defense strategy, and that previous results were likely influenced by gradient obfuscation.

  Access Paper or Ask Questions

Time-scales, Meaning, and Availability of Information in a Global Brain

Jul 11, 2003
Carlos Gershenson, Gottfried Mayer-Kress, Atin Das, Pritha Das, Matus Marko

We note the importance of time-scales, meaning, and availability of information for the emergence of novel information meta-structures at a global scale. We discuss previous work in this area and develop future perspectives. We focus on the transmission of scientific articles and the integration of traditional conferences with their virtual extensions on the Internet, their time-scales, and availability. We mention the Semantic Web as an effort for integrating meaningful information.

* 8 pages, 1 figure 

  Access Paper or Ask Questions

Normalized Information Distance

Sep 15, 2008
Paul M. B. Vitanyi, Frank J. Balbach, Rudi L. Cilibrasi, Ming Li

The normalized information distance is a universal distance measure for objects of all kinds. It is based on Kolmogorov complexity and thus uncomputable, but there are ways to utilize it. First, compression algorithms can be used to approximate the Kolmogorov complexity if the objects have a string representation. Second, for names and abstract concepts, page count statistics from the World Wide Web can be used. These practical realizations of the normalized information distance can then be applied to machine learning tasks, expecially clustering, to perform feature-free and parameter-free data mining. This chapter discusses the theoretical foundations of the normalized information distance and both practical realizations. It presents numerous examples of successful real-world applications based on these distance measures, ranging from bioinformatics to music clustering to machine translation.

* 33 pages, 12 figures, pdf, in: Normalized information distance, in: Information Theory and Statistical Learning, Eds. M. Dehmer, F. Emmert-Streib, Springer-Verlag, New-York, To appear 

  Access Paper or Ask Questions

A possibilistic handling of partially ordered information

Oct 19, 2012
Salem Benferhat, Sylvain Lagrue, Odile Papini

In a standard possibilistic logic, prioritized information are encoded by means of weighted knowledge base. This paper proposes an extension of possibilistic logic for dealing with partially ordered information. We Show that all basic notions of standard possibilitic logic (sumbsumption, syntactic and semantic inference, etc.) have natural couterparts when dealing with partially ordered information. We also propose an algorithm which computes possibilistic conclusions of a partial knowledge base of a partially ordered knowlege base.

* Appears in Proceedings of the Nineteenth Conference on Uncertainty in Artificial Intelligence (UAI2003) 

  Access Paper or Ask Questions

Disentangled Information Bottleneck

Dec 22, 2020
Ziqi Pan, Li Niu, Jianfu Zhang, Liqing Zhang

The information bottleneck (IB) method is a technique for extracting information that is relevant for predicting the target random variable from the source random variable, which is typically implemented by optimizing the IB Lagrangian that balances the compression and prediction terms. However, the IB Lagrangian is hard to optimize, and multiple trials for tuning values of Lagrangian multiplier are required. Moreover, we show that the prediction performance strictly decreases as the compression gets stronger during optimizing the IB Lagrangian. In this paper, we implement the IB method from the perspective of supervised disentangling. Specifically, we introduce Disentangled Information Bottleneck (DisenIB) that is consistent on compressing source maximally without target prediction performance loss (maximum compression). Theoretical and experimental results demonstrate that our method is consistent on maximum compression, and performs well in terms of generalization, robustness to adversarial attack, out-of-distribution detection, and supervised disentangling.

* Revised mathematical proof 

  Access Paper or Ask Questions