A Curated Dataset of Security Defects in Scientific Software Projects

Published in 7th Annual Hot Topics in the Science of Security (HoTSoS) Symposium, 2020

The cybersecurity research community might benefit from a curated dataset where commits mined from scientific software projects are labeled as security defects. We constructed a curated security defect dataset by mining 7,024 commits from 20 scientific software projects. Our dataset can be beneficial for cybersecurity researchers in two ways: (i) use the dataset to conduct security defect categorization and prediction research; and (ii) find undiscovered security defects in scientific software projects.