Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Anika Tasnim Rodela

TextEconomizer: Enhancing Lossy Text Compression with Denoising Transformers and Entropy Coding

Jun 06, 2026

Mahbub E Sobhani, Anika Tasnim Rodela, Chowdhury Mofizur Rahman, Dewan Md. Farid, Swakkhar Shatabda

Abstract:Lossy text compression reduces data size while preserving core meaning, making it well-suited for summarization, automated analysis, and digital archives. Despite the dominance of transformer-based models in language modeling, integrating context vectors and entropy coding into Sequence-to-Sequence (Seq2Seq) generation remains underexplored. A key challenge lies in identifying the most informative context vectors from encoder output and incorporating entropy coding to enhance storage efficiency while maintaining high-quality outputs, even under noisy text. We introduce TextEconomizer, an encoder-decoder framework paired with a transformer neural network that reduces variable-sized inputs by 50% to 80% without prior knowledge of dataset dimensions. Our model achieves competitive compression ratios via entropy coding while delivering near-perfect text quality, assessed by BLEU, ROUGE, METEOR, and semantic similarity scores. TextEconomizer operates with approximately 153x fewer parameters than comparable models, achieving a 5.39x compression ratio without sacrificing semantic quality. We also evaluate an LSTM-based autoencoder achieving a state-of-the-art 67x compression ratio with 196x fewer parameters, and LLaMAFormer, a modified transformer with 263x fewer parameters than ICAE while maintaining competitive text quality. TextEconomizer significantly surpasses existing transformer-based models in balancing memory efficiency and high-fidelity outputs, marking a breakthrough in lossy compression with optimal space utilization.

* Neural Networks, Vol. 203, 109111, 2026
* Published in Neural Networks (Elsevier), Vol. 203, 2026

Via

Access Paper or Ask Questions

An Enhanced Text Compression Approach Using Transformer-based Language Models

Dec 15, 2024

Chowdhury Mofizur Rahman, Mahbub E Sobhani, Anika Tasnim Rodela, Swakkhar Shatabda

Figure 1 for An Enhanced Text Compression Approach Using Transformer-based Language Models

Figure 2 for An Enhanced Text Compression Approach Using Transformer-based Language Models

Figure 3 for An Enhanced Text Compression Approach Using Transformer-based Language Models

Figure 4 for An Enhanced Text Compression Approach Using Transformer-based Language Models

Abstract:Text compression shrinks textual data while keeping crucial information, eradicating constraints on storage, bandwidth, and computational efficacy. The integration of lossless compression techniques with transformer-based text decompression has received negligible attention, despite the increasing volume of English text data in communication. The primary barrier in advancing text compression and restoration involves optimizing transformer-based approaches with efficient pre-processing and integrating lossless compression algorithms, that remained unresolved in the prior attempts. Here, we propose a transformer-based method named RejuvenateForme for text decompression, addressing prior issues by harnessing a new pre-processing technique and a lossless compression method. Our meticulous pre-processing technique incorporating the Lempel-Ziv-Welch algorithm achieves compression ratios of 12.57, 13.38, and 11.42 on the BookCorpus, EN-DE, and EN-FR corpora, thus showing state-of-the-art compression ratios compared to other deep learning and traditional approaches. Furthermore, the RejuvenateForme achieves a BLEU score of 27.31, 25.78, and 50.45 on the EN-DE, EN-FR, and BookCorpus corpora, showcasing its comprehensive efficacy. In contrast, the pre-trained T5-Small exhibits better performance over prior state-of-the-art models.

* 2024 IEEE Region 10 Symposium

Via

Access Paper or Ask Questions