Picture for Daniel Martin Katz

Daniel Martin Katz

The KL3M Data Project: Copyright-Clean Training Resources for Large Language Models

Add code
Apr 10, 2025
Viaarxiv icon

Precise Legal Sentence Boundary Detection for Retrieval at Scale: NUPunkt and CharBoundary

Add code
Apr 05, 2025
Figure 1 for Precise Legal Sentence Boundary Detection for Retrieval at Scale: NUPunkt and CharBoundary
Figure 2 for Precise Legal Sentence Boundary Detection for Retrieval at Scale: NUPunkt and CharBoundary
Figure 3 for Precise Legal Sentence Boundary Detection for Retrieval at Scale: NUPunkt and CharBoundary
Figure 4 for Precise Legal Sentence Boundary Detection for Retrieval at Scale: NUPunkt and CharBoundary
Viaarxiv icon

KL3M Tokenizers: A Family of Domain-Specific and Character-Level Tokenizers for Legal, Financial, and Preprocessing Applications

Add code
Mar 21, 2025
Figure 1 for KL3M Tokenizers: A Family of Domain-Specific and Character-Level Tokenizers for Legal, Financial, and Preprocessing Applications
Figure 2 for KL3M Tokenizers: A Family of Domain-Specific and Character-Level Tokenizers for Legal, Financial, and Preprocessing Applications
Figure 3 for KL3M Tokenizers: A Family of Domain-Specific and Character-Level Tokenizers for Legal, Financial, and Preprocessing Applications
Figure 4 for KL3M Tokenizers: A Family of Domain-Specific and Character-Level Tokenizers for Legal, Financial, and Preprocessing Applications
Viaarxiv icon

LeXFiles and LegalLAMA: Facilitating English Multinational Legal Language Model Development

Add code
May 22, 2023
Figure 1 for LeXFiles and LegalLAMA: Facilitating English Multinational Legal Language Model Development
Figure 2 for LeXFiles and LegalLAMA: Facilitating English Multinational Legal Language Model Development
Figure 3 for LeXFiles and LegalLAMA: Facilitating English Multinational Legal Language Model Development
Figure 4 for LeXFiles and LegalLAMA: Facilitating English Multinational Legal Language Model Development
Viaarxiv icon

Natural Language Processing in the Legal Domain

Add code
Feb 23, 2023
Figure 1 for Natural Language Processing in the Legal Domain
Figure 2 for Natural Language Processing in the Legal Domain
Figure 3 for Natural Language Processing in the Legal Domain
Figure 4 for Natural Language Processing in the Legal Domain
Viaarxiv icon

GPT as Knowledge Worker: A Zero-Shot Evaluation of CPA Capabilities

Add code
Jan 11, 2023
Viaarxiv icon

GPT Takes the Bar Exam

Add code
Dec 29, 2022
Figure 1 for GPT Takes the Bar Exam
Figure 2 for GPT Takes the Bar Exam
Figure 3 for GPT Takes the Bar Exam
Figure 4 for GPT Takes the Bar Exam
Viaarxiv icon

Law Smells: Defining and Detecting Problematic Patterns in Legal Drafting

Add code
Oct 15, 2021
Viaarxiv icon

LexGLUE: A Benchmark Dataset for Legal Language Understanding in English

Add code
Oct 13, 2021
Figure 1 for LexGLUE: A Benchmark Dataset for Legal Language Understanding in English
Figure 2 for LexGLUE: A Benchmark Dataset for Legal Language Understanding in English
Figure 3 for LexGLUE: A Benchmark Dataset for Legal Language Understanding in English
Figure 4 for LexGLUE: A Benchmark Dataset for Legal Language Understanding in English
Viaarxiv icon

OpenEDGAR: Open Source Software for SEC EDGAR Analysis

Add code
Jun 13, 2018
Figure 1 for OpenEDGAR: Open Source Software for SEC EDGAR Analysis
Figure 2 for OpenEDGAR: Open Source Software for SEC EDGAR Analysis
Figure 3 for OpenEDGAR: Open Source Software for SEC EDGAR Analysis
Figure 4 for OpenEDGAR: Open Source Software for SEC EDGAR Analysis
Viaarxiv icon