Software Archeology @ RIT

[ar·che·ol·o·gy] n. the study of people by way of their artifacts
Week 12 - Revised Sheriff Data, Developer Emails, and Security

27 Apr 2014

Revised Sheriff Data

A while ago, our team was able to get access to Chromium sheriff rotation calendars and scrape their data to augment our other data, such as code reviews, CVEs, and developer information. However, we realized that for about half of the calendar rotation events the displayed emails in Google Calendar were replaced by developer names in our spreadsheet. This was problematic because we currently only keep track of developers by emails, which are referenced in the code reviews. When we compared some events as displayed in our scraped spreadsheet versus the online calendar, we realized the scraper we had chosen to use seemed to be replacing the email with a name when available. While this is a useful feature for some cases, this was unhelpful for our research.

To fix this issue, we have now written our own scraper in Google Apps Script. (To learn more about Google Apps Script, see http://www.google.com/script/start/). Using the provided APIs (Application Programming Interface), we were able to write a script that is guaranteed to get the emails of the sheriffs in the rotation changes. This should improve the quality of our data and our capability to make associations and draw conclusions for metrics in the long run.

Developer Emails

Our team has also been attempting to address the concern that developers might be using multiple email addresses, such as temporary accounts. For instance, when looking at the emails of developers assigned to a code review it sometimes appears that there is more than one email address for a single developer. This would be a problem because we currently have it in our database such that emails are a unique column in our Developer table. To fix this problem our team is working on filtering out temporary email addresses and duplicates.

Security

In the realm of security, our team has made methods to get more information on vulnerabilities. One method checks the number of vulnerability inspections a developer has been in, and another returns the number of developers who have reviewed vulnerabilities and have also inspected a particular filepath. Additionally, we have a method to get the participants of a code review who participated in a prior security fixing code review. These are each very useful new methods, but perhaps of all of them it’s easiest to see the practical application of this last one in our metrics. If we can determine there are reviewers in a code review who have been involved in other security-related reviews, we can use that information in our metrics to see if having such a person helps reduce number of vulnerabilities later. This also relates to code familiarity and developer experience!

« Home