Machine translation (MT) encompasses a variety of methodologies aimed at enhancing the accuracy of translations. In contrast, the process of human-generated translation relies on a wide range of translation techniques, which are crucial for ensuring linguistic adequacy and fluency. This study suggests that these translation techniques could further optimize machine translation if they are automatically identified before being applied to guide the translation process effectively. The study differentiates between two scenarios of the translation process: from-scratch translation and post-editing. For each scenario, a specific set of experiments has been designed to forecast the most appropriate translation techniques. The findings indicate that the predictive accuracy for from-scratch translation reaches 82%, while the post-editing process exhibits even greater potential, achieving an accuracy rate of 93%.
When applying automated speech recognition (ASR) for Belgian Dutch (Van Dyck et al. 2021), the output consists of an unsegmented stream of words, without any punctuation. A next step is to perform segmentation and insert punctuation, making the ASR output more readable and easy to manually correct. As far as we know there is no publicly available punctuation insertion system for Dutch that functions at a usable level. The model we present here is an extension of the models of Guhr et al. (2021) for Dutch and is made publicly available. We trained a sequence classification model, based on the Dutch language model RobBERT (Delobelle et al. 2020). For every word in the input sequence, the models predicts a punctuation marker that follows the word. We have also extended a multilingual model, for cases where the language is unknown or where code switching applies. When performing the task of segmentation, the application of the best models onto out of domain test data, a sliding window of 200 words of the ASR output stream is sent to the classifier, and segmentation is applied when the system predicts a segmenting punctuation sign with a ratio above threshold. Results show to be much better than a machine translation baseline approach.