Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sami Ul Haq

Audio-Based Crowd-Sourced Evaluation of Machine Translation Quality

Sep 17, 2025

Sami Ul Haq, Sheila Castilho, Yvette Graham

Abstract:Machine Translation (MT) has achieved remarkable performance, with growing interest in speech translation and multimodal approaches. However, despite these advancements, MT quality assessment remains largely text centric, typically relying on human experts who read and compare texts. Since many real-world MT applications (e.g Google Translate Voice Mode, iFLYTEK Translator) involve translation being spoken rather printed or read, a more natural way to assess translation quality would be through speech as opposed text-only evaluations. This study compares text-only and audio-based evaluations of 10 MT systems from the WMT General MT Shared Task, using crowd-sourced judgments collected via Amazon Mechanical Turk. We additionally, performed statistical significance testing and self-replication experiments to test reliability and consistency of audio-based approach. Crowd-sourced assessments based on audio yield rankings largely consistent with text only evaluations but, in some cases, identify significant differences between translation systems. We attribute this to speech richer, more natural modality and propose incorporating speech-based assessments into future MT evaluation frameworks.

* Accepted at WMT2025 (ENNLP) for oral presented

Via

Access Paper or Ask Questions

Long-context Reference-based MT Quality Estimation

Sep 17, 2025

Sami Ul Haq, Chinonso Cynthia Osuji, Sheila Castilho, Brian Davis

Abstract:In this paper, we present our submission to the Tenth Conference on Machine Translation (WMT25) Shared Task on Automated Translation Quality Evaluation. Our systems are built upon the COMET framework and trained to predict segment-level Error Span Annotation (ESA) scores using augmented long-context data. To construct long-context training data, we concatenate in-domain, human-annotated sentences and compute a weighted average of their scores. We integrate multiple human judgment datasets (MQM, SQM, and DA) by normalising their scales and train multilingual regression models to predict quality scores from the source, hypothesis, and reference translations. Experimental results show that incorporating long-context information improves correlations with human judgments compared to models trained only on short segments.

Via

Access Paper or Ask Questions

AttCDCNet: Attention-enhanced Chest Disease Classification using X-Ray Images

Oct 20, 2024

Omar Hesham Khater, Abdullahi Sani Shuaib, Sami Ul Haq, Abdul Jabbar Siddiqui

Figure 1 for AttCDCNet: Attention-enhanced Chest Disease Classification using X-Ray Images

Figure 2 for AttCDCNet: Attention-enhanced Chest Disease Classification using X-Ray Images

Figure 3 for AttCDCNet: Attention-enhanced Chest Disease Classification using X-Ray Images

Figure 4 for AttCDCNet: Attention-enhanced Chest Disease Classification using X-Ray Images

Abstract:Chest X-rays (X-ray images) have been proven to be effective for the diagnosis of chest diseases, including Pneumonia, Lung Opacity, and COVID-19. However, relying on traditional medical methods for diagnosis from X-ray images is prone to delays and inaccuracies because the medical personnel who evaluate the X-ray images may have preconceived biases. For this reason, researchers have proposed the use of deep learning-based techniques to facilitate the diagnosis process. The preeminent method is the use of sophisticated Convolutional Neural Networks (CNNs). In this paper, we propose a novel detection model named \textbf{AttCDCNet} for the task of X-ray image diagnosis, enhancing the popular DenseNet121 model by adding an attention block to help the model focus on the most relevant regions, using focal loss as a loss function to overcome the imbalance of the dataset problem, and utilizing depth-wise convolution to reduce the parameters to make the model lighter. Through extensive experimental evaluations, the proposed model demonstrates exceptional performance, showing better results than the original DenseNet121. The proposed model achieved an accuracy, precision and recall of 94.94%, 95.14% and 94.53%, respectively, on the COVID-19 Radiography Dataset.

Via

Access Paper or Ask Questions