Picture for Michael J Bommarito

Michael J Bommarito

Precise Legal Sentence Boundary Detection for Retrieval at Scale: NUPunkt and CharBoundary

Add code
Apr 05, 2025
Figure 1 for Precise Legal Sentence Boundary Detection for Retrieval at Scale: NUPunkt and CharBoundary
Figure 2 for Precise Legal Sentence Boundary Detection for Retrieval at Scale: NUPunkt and CharBoundary
Figure 3 for Precise Legal Sentence Boundary Detection for Retrieval at Scale: NUPunkt and CharBoundary
Figure 4 for Precise Legal Sentence Boundary Detection for Retrieval at Scale: NUPunkt and CharBoundary
Viaarxiv icon

KL3M Tokenizers: A Family of Domain-Specific and Character-Level Tokenizers for Legal, Financial, and Preprocessing Applications

Add code
Mar 21, 2025
Figure 1 for KL3M Tokenizers: A Family of Domain-Specific and Character-Level Tokenizers for Legal, Financial, and Preprocessing Applications
Figure 2 for KL3M Tokenizers: A Family of Domain-Specific and Character-Level Tokenizers for Legal, Financial, and Preprocessing Applications
Figure 3 for KL3M Tokenizers: A Family of Domain-Specific and Character-Level Tokenizers for Legal, Financial, and Preprocessing Applications
Figure 4 for KL3M Tokenizers: A Family of Domain-Specific and Character-Level Tokenizers for Legal, Financial, and Preprocessing Applications
Viaarxiv icon