Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Who Evaluates the Evaluators? On Automatic Metrics for Assessing AI-based Offensive Code Generators

Dec 12, 2022

Cristina Improta, Pietro Liguori, Roberto Natella, Bojan Cukic, Domenico Cotroneo

Figure 1 for Who Evaluates the Evaluators? On Automatic Metrics for Assessing AI-based Offensive Code Generators

Figure 2 for Who Evaluates the Evaluators? On Automatic Metrics for Assessing AI-based Offensive Code Generators

Figure 3 for Who Evaluates the Evaluators? On Automatic Metrics for Assessing AI-based Offensive Code Generators

Figure 4 for Who Evaluates the Evaluators? On Automatic Metrics for Assessing AI-based Offensive Code Generators

Share this with someone who'll enjoy it:

Abstract:AI-based code generators are an emerging solution for automatically writing programs starting from descriptions in natural language, by using deep neural networks (Neural Machine Translation, NMT). In particular, code generators have been used for ethical hacking and offensive security testing by generating proof-of-concept attacks. Unfortunately, the evaluation of code generators still faces several issues. The current practice uses automatic metrics, which compute the textual similarity of generated code with ground-truth references. However, it is not clear what metric to use, and which metric is most suitable for specific contexts. This practical experience report analyzes a large set of output similarity metrics on offensive code generators. We apply the metrics on two state-of-the-art NMT models using two datasets containing offensive assembly and Python code with their descriptions in the English language. We compare the estimates from the automatic metrics with human evaluation and provide practical insights into their strengths and limitations.

View paper on

Share this with someone who'll enjoy it:

Title:Who Evaluates the Evaluators? On Automatic Metrics for Assessing AI-based Offensive Code Generators

Paper and Code