Picture for Michael J Bommarito

Michael J Bommarito

Precise Legal Sentence Boundary Detection for Retrieval at Scale: NUPunkt and CharBoundary

Add code
Apr 05, 2025
Viaarxiv icon

KL3M Tokenizers: A Family of Domain-Specific and Character-Level Tokenizers for Legal, Financial, and Preprocessing Applications

Add code
Mar 21, 2025
Viaarxiv icon