Alert button
Picture for Daniel Deutsch

Daniel Deutsch

Alert button

Pinpoint, Not Criticize: Refining Large Language Models via Fine-Grained Actionable Feedback

Nov 15, 2023
Wenda Xu, Daniel Deutsch, Mara Finkelstein, Juraj Juraska, Biao Zhang, Zhongtao Liu, William Yang Wang, Lei Li, Markus Freitag

Viaarxiv icon

There's no Data Like Better Data: Using QE Metrics for MT Data Filtering

Nov 09, 2023
Jan-Thorsten Peter, David Vilar, Daniel Deutsch, Mara Finkelstein, Juraj Juraska, Markus Freitag

Viaarxiv icon

The Eval4NLP 2023 Shared Task on Prompting Large Language Models as Explainable Metrics

Oct 30, 2023
Christoph Leiter, Juri Opitz, Daniel Deutsch, Yang Gao, Rotem Dror, Steffen Eger

Viaarxiv icon

Training and Meta-Evaluating Machine Translation Evaluation Metrics at the Paragraph Level

Aug 28, 2023
Daniel Deutsch, Juraj Juraska, Mara Finkelstein, Markus Freitag

Figure 1 for Training and Meta-Evaluating Machine Translation Evaluation Metrics at the Paragraph Level
Figure 2 for Training and Meta-Evaluating Machine Translation Evaluation Metrics at the Paragraph Level
Figure 3 for Training and Meta-Evaluating Machine Translation Evaluation Metrics at the Paragraph Level
Figure 4 for Training and Meta-Evaluating Machine Translation Evaluation Metrics at the Paragraph Level
Viaarxiv icon

The Devil is in the Errors: Leveraging Large Language Models for Fine-grained Machine Translation Evaluation

Aug 14, 2023
Patrick Fernandes, Daniel Deutsch, Mara Finkelstein, Parker Riley, André F. T. Martins, Graham Neubig, Ankush Garg, Jonathan H. Clark, Markus Freitag, Orhan Firat

Figure 1 for The Devil is in the Errors: Leveraging Large Language Models for Fine-grained Machine Translation Evaluation
Figure 2 for The Devil is in the Errors: Leveraging Large Language Models for Fine-grained Machine Translation Evaluation
Figure 3 for The Devil is in the Errors: Leveraging Large Language Models for Fine-grained Machine Translation Evaluation
Figure 4 for The Devil is in the Errors: Leveraging Large Language Models for Fine-grained Machine Translation Evaluation
Viaarxiv icon

Ties Matter: Modifying Kendall's Tau for Modern Metric Meta-Evaluation

May 23, 2023
Daniel Deutsch, George Foster, Markus Freitag

Figure 1 for Ties Matter: Modifying Kendall's Tau for Modern Metric Meta-Evaluation
Figure 2 for Ties Matter: Modifying Kendall's Tau for Modern Metric Meta-Evaluation
Figure 3 for Ties Matter: Modifying Kendall's Tau for Modern Metric Meta-Evaluation
Figure 4 for Ties Matter: Modifying Kendall's Tau for Modern Metric Meta-Evaluation
Viaarxiv icon

Needle in a Haystack: An Analysis of Finding Qualified Workers on MTurk for Summarization

Dec 28, 2022
Lining Zhang, João Sedoc, Simon Mille, Yufang Hou, Sebastian Gehrmann, Daniel Deutsch, Elizabeth Clark, Yixin Liu, Miruna Clinciu, Saad Mahamood, Khyathi Chandu

Figure 1 for Needle in a Haystack: An Analysis of Finding Qualified Workers on MTurk for Summarization
Figure 2 for Needle in a Haystack: An Analysis of Finding Qualified Workers on MTurk for Summarization
Figure 3 for Needle in a Haystack: An Analysis of Finding Qualified Workers on MTurk for Summarization
Figure 4 for Needle in a Haystack: An Analysis of Finding Qualified Workers on MTurk for Summarization
Viaarxiv icon

On the Limitations of Reference-Free Evaluations of Generated Text

Oct 22, 2022
Daniel Deutsch, Rotem Dror, Dan Roth

Figure 1 for On the Limitations of Reference-Free Evaluations of Generated Text
Figure 2 for On the Limitations of Reference-Free Evaluations of Generated Text
Figure 3 for On the Limitations of Reference-Free Evaluations of Generated Text
Figure 4 for On the Limitations of Reference-Free Evaluations of Generated Text
Viaarxiv icon

GEMv2: Multilingual NLG Benchmarking in a Single Line of Code

Jun 24, 2022
Sebastian Gehrmann, Abhik Bhattacharjee, Abinaya Mahendiran, Alex Wang, Alexandros Papangelis, Aman Madaan, Angelina McMillan-Major, Anna Shvets, Ashish Upadhyay, Bingsheng Yao, Bryan Wilie, Chandra Bhagavatula, Chaobin You, Craig Thomson, Cristina Garbacea, Dakuo Wang, Daniel Deutsch, Deyi Xiong, Di Jin, Dimitra Gkatzia, Dragomir Radev, Elizabeth Clark, Esin Durmus, Faisal Ladhak, Filip Ginter, Genta Indra Winata, Hendrik Strobelt, Hiroaki Hayashi, Jekaterina Novikova, Jenna Kanerva, Jenny Chim, Jiawei Zhou, Jordan Clive, Joshua Maynez, João Sedoc, Juraj Juraska, Kaustubh Dhole, Khyathi Raghavi Chandu, Laura Perez-Beltrachini, Leonardo F. R. Ribeiro, Lewis Tunstall, Li Zhang, Mahima Pushkarna, Mathias Creutz, Michael White, Mihir Sanjay Kale, Moussa Kamal Eddine, Nico Daheim, Nishant Subramani, Ondrej Dusek, Paul Pu Liang, Pawan Sasanka Ammanamanchi, Qi Zhu, Ratish Puduppully, Reno Kriz, Rifat Shahriyar, Ronald Cardenas, Saad Mahamood, Salomey Osei, Samuel Cahyawijaya, Sanja Štajner, Sebastien Montella, Shailza, Shailza Jolly, Simon Mille, Tahmid Hasan, Tianhao Shen, Tosin Adewumi, Vikas Raunak, Vipul Raheja, Vitaly Nikolaev, Vivian Tsai, Yacine Jernite, Ying Xu, Yisi Sang, Yixin Liu, Yufang Hou

Figure 1 for GEMv2: Multilingual NLG Benchmarking in a Single Line of Code
Figure 2 for GEMv2: Multilingual NLG Benchmarking in a Single Line of Code
Figure 3 for GEMv2: Multilingual NLG Benchmarking in a Single Line of Code
Figure 4 for GEMv2: Multilingual NLG Benchmarking in a Single Line of Code
Viaarxiv icon