This year, the DEFT campaign (D\'efi Fouilles de Textes) incorporates a task which aims at identifying the session in which articles of previous TALN conferences were presented. We describe the three statistical systems developed at LIA/ADOC for this task. A fusion of these systems enables us to obtain interesting results (micro-precision score of 0.76 measured on the test corpus)
With the agreement of my coauthors, I Zhangyang Wang would like to withdraw the manuscript "Stacked Approximated Regression Machine: A Simple Deep Learning Approach". Some experimental procedures were not included in the manuscript, which makes a part of important claims not meaningful. In the relevant research, I was solely responsible for carrying out the experiments; the other coauthors joined in the discussions leading to the main algorithm. Please see the updated text for more details.
This paper presents a new Bayesian non-parametric model by extending the usage of Hierarchical Dirichlet Allocation to extract tree structured word clusters from text data. The inference algorithm of the model collects words in a cluster if they share similar distribution over documents. In our experiments, we observed meaningful hierarchical structures on NIPS corpus and radiology reports collected from public repositories.
Biometrics systems have been used in a wide range of applications and have improved people authentication. Signature verification is one of the most common biometric methods with techniques that employ various specifications of a signature. Recently, deep learning has achieved great success in many fields, such as image, sounds and text processing. In this paper, deep learning method has been used for feature extraction and feature selection.
We investigate the hypothesis that word representations ought to incorporate both distributional and relational semantics. To this end, we employ the Alternating Direction Method of Multipliers (ADMM), which flexibly optimizes a distributional objective on raw text and a relational objective on WordNet. Preliminary results on knowledge base completion, analogy tests, and parsing show that word representations trained on both objectives can give improvements in some cases.
The main contribution of this paper, is to propose a novel semantic approach based on a Natural Language Processing technique in order to ensure a semantic unification of unstructured process patterns which are expressed not only in different formats but also, in different forms. This approach is implemented using the GATE text engineering framework and then evaluated leading up to high-quality results motivating us to continue in this direction.
This paper describes the NECA MNLG; a fully implemented Multimodal Natural Language Generation module. The MNLG is deployed as part of the NECA system which generates dialogues between animated agents. The generation module supports the seamless integration of full grammar rules, templates and canned text. The generator takes input which allows for the specification of syntactic, semantic and pragmatic constraints on the output.
The mathematical distinction between prose and verse may be detected in writings that are not apparently lineated, for example in T. S. Eliot's "Burnt Norton", and Jim Crace's "Quarantine". In this paper we offer comments on appropriate statistical methods for such work, and also on the nature of formal innovation in these two texts. Additional remarks are made on the roots of lineation as a metrical form, and on the prose-verse continuum.
Automated text generation requires a underlying knowledge base from which to generate, which is often difficult to produce. Software documentation is one domain in which parts of this knowledge base may be derived automatically. In this paper, we describe \drafter, an authoring support tool for generating user-centred software documentation, and in particular, we describe how parts of its required knowledge base can be obtained automatically.