Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Julian Moreno-Schneider

Evaluating Document Representations for Content-based Legal Literature Recommendations

Apr 28, 2021

Malte Ostendorff, Elliott Ash, Terry Ruas, Bela Gipp, Julian Moreno-Schneider, Georg Rehm

Figure 1 for Evaluating Document Representations for Content-based Legal Literature Recommendations

Figure 2 for Evaluating Document Representations for Content-based Legal Literature Recommendations

Figure 3 for Evaluating Document Representations for Content-based Legal Literature Recommendations

Figure 4 for Evaluating Document Representations for Content-based Legal Literature Recommendations

Abstract:Recommender systems assist legal professionals in finding relevant literature for supporting their case. Despite its importance for the profession, legal applications do not reflect the latest advances in recommender systems and representation learning research. Simultaneously, legal recommender systems are typically evaluated in small-scale user study without any public available benchmark datasets. Thus, these studies have limited reproducibility. To address the gap between research and practice, we explore a set of state-of-the-art document representation methods for the task of retrieving semantically related US case law. We evaluate text-based (e.g., fastText, Transformers), citation-based (e.g., DeepWalk, Poincar\'e), and hybrid methods. We compare in total 27 methods using two silver standards with annotations for 2,964 documents. The silver standards are newly created from Open Case Book and Wikisource and can be reused under an open license facilitating reproducibility. Our experiments show that document representations from averaged fastText word vectors (trained on legal corpora) yield the best results, closely followed by Poincar\'e citation embeddings. Combining fastText and Poincar\'e in a hybrid manner further improves the overall result. Besides the overall performance, we analyze the methods depending on document length, citation count, and the coverage of their recommendations. We make our source code, models, and datasets publicly available at https://github.com/malteos/legal-document-similarity/.

* Accepted for publication at ICAIL 2021

Via

Access Paper or Ask Questions

Enriching BERT with Knowledge Graph Embeddings for Document Classification

Sep 18, 2019

Malte Ostendorff, Peter Bourgonje, Maria Berger, Julian Moreno-Schneider, Georg Rehm, Bela Gipp

Figure 1 for Enriching BERT with Knowledge Graph Embeddings for Document Classification

Figure 2 for Enriching BERT with Knowledge Graph Embeddings for Document Classification

Figure 3 for Enriching BERT with Knowledge Graph Embeddings for Document Classification

Figure 4 for Enriching BERT with Knowledge Graph Embeddings for Document Classification

Abstract:In this paper, we focus on the classification of books using short descriptive texts (cover blurbs) and additional metadata. Building upon BERT, a deep neural language model, we demonstrate how to combine text representations with metadata and knowledge graph embeddings, which encode author information. Compared to the standard BERT approach we achieve considerably better results for the classification task. For a more coarse-grained classification using eight labels we achieve an F1- score of 87.20, while a detailed classification using 343 labels yields an F1-score of 64.70. We make the source code and trained models of our experiments publicly available

Via

Access Paper or Ask Questions