Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Melih Sirlanci

The Ohio State University

EXHIB: A Benchmark for Realistic and Diverse Evaluation of Function Similarity in the Wild

Apr 02, 2026

Yiming Fan, Jun Yeon Won, Ding Zhu, Melih Sirlanci, Mahdi Khalili, Carter Yagemann

Abstract:Binary Function Similarity Detection (BFSD) is a core problem in software security, supporting tasks such as vulnerability analysis, malware classification, and patch provenance. In the past few decades, numerous models and tools have been developed for this application; however, due to the lack of a comprehensive universal benchmark in this field, researchers have struggled to compare different models effectively. Existing datasets are limited in scope, often focusing on a narrow set of transformations or types of binaries, and fail to reflect the full diversity of real-world applications. We introduce EXHIB, a benchmark comprising five realistic datasets collected from the wild, each highlighting a distinct aspect of the BFSD problem space. We evaluate 9 representative models spanning multiple BFSD paradigms on EXHIB and observe performance degradations of up to 30% on firmware and semantic datasets compared to standard settings, revealing substantial generalization gaps. Our results show that robustness to low- and mid-level binary variations does not generalize to high-level semantic differences, underscoring a critical blind spot in current BFSD evaluation practices.

* 13 pages, 7 figures. This is a technical report for the EXHIB benchmark. Code and data are available at https://github.com/fan1192/bfsd-anon-artifact

Via

Access Paper or Ask Questions

C2RUST-BENCH: A Minimized, Representative Dataset for C-to-Rust Transpilation Evaluation

Apr 21, 2025

Melih Sirlanci, Carter Yagemann, Zhiqiang Lin

Figure 1 for C2RUST-BENCH: A Minimized, Representative Dataset for C-to-Rust Transpilation Evaluation

Figure 2 for C2RUST-BENCH: A Minimized, Representative Dataset for C-to-Rust Transpilation Evaluation

Figure 3 for C2RUST-BENCH: A Minimized, Representative Dataset for C-to-Rust Transpilation Evaluation

Figure 4 for C2RUST-BENCH: A Minimized, Representative Dataset for C-to-Rust Transpilation Evaluation

Abstract:Despite the effort in vulnerability detection over the last two decades, memory safety vulnerabilities continue to be a critical problem. Recent reports suggest that the key solution is to migrate to memory-safe languages. To this end, C-to-Rust transpilation becomes popular to resolve memory-safety issues in C programs. Recent works propose C-to-Rust transpilation frameworks; however, a comprehensive evaluation dataset is missing. Although one solution is to put together a large enough dataset, this increases the analysis time in automated frameworks as well as in manual efforts for some cases. In this work, we build a method to select functions from a large set to construct a minimized yet representative dataset to evaluate the C-to-Rust transpilation. We propose C2RUST-BENCH that contains 2,905 functions, which are representative of C-to-Rust transpilation, selected from 15,503 functions of real-world programs.

Via

Access Paper or Ask Questions

Malicious Code Detection: Run Trace Output Analysis by LSTM

Jan 14, 2021

Cengiz Acarturk, Melih Sirlanci, Pinar Gurkan Balikcioglu, Deniz Demirci, Nazenin Sahin, Ozge Acar Kucuk

Figure 1 for Malicious Code Detection: Run Trace Output Analysis by LSTM

Figure 2 for Malicious Code Detection: Run Trace Output Analysis by LSTM

Figure 3 for Malicious Code Detection: Run Trace Output Analysis by LSTM

Figure 4 for Malicious Code Detection: Run Trace Output Analysis by LSTM

Abstract:Malicious software threats and their detection have been gaining importance as a subdomain of information security due to the expansion of ICT applications in daily settings. A major challenge in designing and developing anti-malware systems is the coverage of the detection, particularly the development of dynamic analysis methods that can detect polymorphic and metamorphic malware efficiently. In the present study, we propose a methodological framework for detecting malicious code by analyzing run trace outputs by Long Short-Term Memory (LSTM). We developed models of run traces of malicious and benign Portable Executable (PE) files. We created our dataset from run trace outputs obtained from dynamic analysis of PE files. The obtained dataset was in the instruction format as a sequence and was called Instruction as a Sequence Model (ISM). By splitting the first dataset into basic blocks, we obtained the second one called Basic Block as a Sequence Model (BSM). The experiments showed that the ISM achieved an accuracy of 87.51% and a false positive rate of 18.34%, while BSM achieved an accuracy of 99.26% and a false positive rate of 2.62%.

* 11 pages, 5 figures, 5 tables, accepted to IEEE Access

Via

Access Paper or Ask Questions